Part I. Sampling design. Overview. INFOWO Lecture M6: Sampling design and Experiments. Outline. Sampling design Experiments.

Overview INFOWO Lecture M6: Sampling design and Experiments Peter de Waal Sampling design Experiments Department of Information and Computing Sciences Faculty of Science, Universiteit Utrecht Lecture 4: 1 / 46 Lecture 4: 2 / 46 Outline Part I Sampling design The nature of sampling Sample design Probability sampling Non-probability sampling Sampling on the Internet Lecture 4: Introduction 3 / 46 Lecture 4: Sampling Introduction 4 / 46

What is sampling? Sampling: relevant terminology Sampling The process of selecting a sufficient number of elements from the population, so that results from analyzing the sample are generalisable to the population. Population: The entire group of people, events, or things of interest that the researcher wishes to investigate. Element: Single member of the population. Sample: Selected members (subset) of the population. Subject: Single member of the sample. Parameters: Characteristics of the population Lecture 4: Sampling Introduction 5 / 46 Lecture 4: Sampling The nature of sampling 6 / 46 Unit of analysis Parameters versus Statistics The elements and subjects are the units of analysis: What is the research problem, i.e. on what level do you look for answers? At what level do we need information, what do we measure? At what level do we want to implement the answers found? Population Parameters μ, σ 2, σ Estimate Sample Statistics X, s 2, s The relationship between sample and population Lecture 4: Sampling The nature of sampling 7 / 46 Lecture 4: Sampling The nature of sampling 8 / 46

Advantages of sampling Sampling error: What is a good sample? Accurate: Practical versus impossible Lower costs Greater accuracy of results Greater speed of data collection Destruction of elements avoided Absence of systematic variance (bias) Related to validity Precise: Random error that sample is by chance different from population Decreases with sample size Related to reliability Lecture 4: Sampling The nature of sampling 9 / 46 Lecture 4: Sampling The nature of sampling 10 / 46 The sampling process Major steps 1 Define the relevant population 2 Determine the parameters of interest 3 Determine the sampling frame 4 Determine the sample design 5 Determine the sample size 6 Execute the sampling process The sampling frame Sampling frame Accessible section of target population = List of elements from which the sample is actually drawn Contents: Preferably: complete and correct list of population elements In practice: different from theoretical population Examples: Telephone directory Student directory Business directory Lecture 4: Sampling Design 11 / 46 Lecture 4: Sampling Design 12 / 46

Sampling techniques Probability versus non-probability sampling Probability sampling: Elements have a known and non-zero chance of being chosen Simple Random Sampling Systematic Sampling Stratified Random Sampling Cluster Sampling Non-probability sampling: Subjective selection of elements from population Convenience Sampling Judgment Sampling Quota Sampling Simple Random Sampling Procedure Each element has a known and equal chance of being selected Characteristics Unbiased. Easily understood. Reliable population frame necessary. Takes more time to implement. Uses larger sample sizes. Produces larger errors. Expensive. Lecture 4: Sampling Design 13 / 46 Lecture 4: Sampling Probability sampling 14 / 46 Systematic sampling Stratified Sampling Procedure Procedure Population size N, sample size n, set k = N n, Each k-th element, starting with random choice of an element between 1 and k, Characteristics Simple to design Easier than simple random sampling Less expensive than simple random sampling Systematic biases when elements are not randomly listed Lecture 4: Sampling Probability sampling 15 / 46 Divide population in sub-populations or strata. Include all strata. Random selection of elements from strata Proportional = distribution of population retained in sample Non-proportional Results may be weighted and combined. Characteristics Increased statistical efficiency Provides data to represent and analyse sub-groups. Enables use of different methods in strata. Includes all relevant sub-populations. Increased error if sub-groups are selected at different rates. Expensive. Lecture 4: Sampling Probability sampling 16 / 46

Disproportionate Stratified Sampling Cluster Sampling Example: # of Number of subjects Job level elements Proport. Disproport. Top management 10 2 7 Middle management 30 6 15 Lower management 50 10 20 Supervisors 100 20 30 Clerks 500 100 60 Secretaries 20 4 10 Total 710 142 142 Procedure Divide population into internally heterogeneous clusters Random selection of one or more clusters Include all elements from selected clusters Characteristics Provides unbiased estimate of population parameters if properly done. Easy and cost efficient Often lower statistical efficiency (more error) due to sub-groups being homogeneous rather than heterogeneous. Lecture 4: Sampling Probability sampling 17 / 46 Lecture 4: Sampling Probability sampling 18 / 46 Sample size in probability sampling Sample size is determined by: Standard error of the measurement Confidence interval and is based on Central Limit Theorem One thing does NOT determine sample size: The size of the population Lecture 4: Sampling Probability sampling 19 / 46 Calculation of sample size: example Height of adult NL women has normal distribution with known σ = 5.0 cm. 95% confidence interval of unknown mean µ: Sample size (N) σ M Confidence interval 1 5.0 cm X ± 9.80 4 2.5 cm X ± 4.90 25 1.0 cm X ± 1.96 100 0.5 cm X ± 0.98 2500 0.1 cm X ± 0.17 10000 0.05 cm X ± 0.10 Confidence interval: X ± 1.96 σ M How many samples for 0.50 cm precision? σ 0.50 = 1.96 N N = 1.96 5.0 0.5 N = (19.6) 2 392. Lecture 4: Sampling Probability sampling 20 / 46

Non-probability Sampling Non-probability Sampling Characteristics Convenience sampling: Some units of population have zero change of selection, or Select most easily accessible members as subjects. Probability of selection cannot be accurately determined. Reasons to use? Procedure satisfactorily meets the sampling objectives Lower cost Limited time Not as much human error as selecting a completely random sample Total list population not available Purposive sampling Expert sampling: Subjects selected on basis of expertise in the investigated subject. Quota sampling: Subjects conveniently chosen from targeted groups, according to predetermined quota Snowball sampling Respondents recommend respondents who recommend... Lecture 4: Sampling Non-probability sampling 21 / 46 Lecture 4: Sampling Non-probability sampling 22 / 46 Internet Sampling Internet offers easy access to populations and samples Caveats: Populations are unclear Part II Experiments Is the Internet sample representative? Control over who enters the sample Lecture 4: Sampling Non-probability sampling 23 / 46 Lecture 4: Experiments 24 / 46

Outline What is an experiment? Experiment Uses of experimentation Advantages and disadvantages of the experimental method 7 steps of a well planned experiment Internal and external validity in experiments Different types of experimental design Data collection method in which one or more IVs are manipulated in order to measure their effect on a DV, while controlling for exogenous variables in order to test a hypothesis. Independent variables: manipulation or systematic variation designed by researcher typically involves one or more experimental stimulus (called a treatment) Dependent variables: measured outcome Exogenous variables: (hopefully) controlled by the researcher Lecture 4: Experiments 25 / 46 Lecture 4: Experiments Introduction 26 / 46 Advantages of experiments Disadvantages Ability to manipulate the independent variable Contamination from extraneous variables can be controlled more efficiently Convenience Cost Replication Artificiality of the laboratory Only limited number of variables can be controlled Generalisation from non-probability samples Larger budgets needed (sometimes) Restricted to problems of the present or immediate future Ethical limits to manipulation of people Lecture 4: Experiments Introduction 27 / 46 Lecture 4: Experiments Introduction 28 / 46

Experimentation process Internal validity Question: 1 Select relevant variables 2 Specify the treatment levels 3 Control the experimental environment 4 Choose the experimental design 5 Select and assign the participants 6 Pilot-test, revise, and test 7 Analyse the data Does the experiment really measure what it claims it does? Do the conclusions we draw about a demonstrated experimental relationship truly imply an effect? Possible Threats: History Maturation Testing Instrumentation Statistical Regression / Regression to the mean Experimental Mortality Lecture 4: Experiments Experimentation process 29 / 46 Lecture 4: Experiments Experimentation process 30 / 46 External validity Experimental designs Question: Are the results generalisable? Possible threats: The Reactivity of Testing on X True Experimental designs Quasi and Field experiments Interaction of Selection and X Other Biasing Effects on X Artificial setting of testing Respondents knowledge of testing Lecture 4: Experiments Experimentation process 31 / 46 Lecture 4: Experiments Experimental designs 32 / 46

Design Symbols True Experimental Design Experiments can be represented in a diagram, which uses the following symbols X the introduction of an experimental stimulus to the participant O a measurement or Observation activity R an indication that sample units have been Randomly assigned True experimental Designs must have two properties: 1 Consists of an experimental/treatment group and a control group. 2 Experimental and control group equal through random assignments or matching. Lecture 4: Experiments Experimental designs 33 / 46 Lecture 4: Experiments Experimental designs 34 / 46 True Experimental Design Example True Experimental Design (cont d) Pre-test, post-test control group design R O 1 X O 2 (Treatment group) R O 3 O 4 (Control group) Effect of experimental variable: E = (O 2 O 1 ) (O 4 O 3 ) Post-test only control group design R X O 1 (Treatment group) R O 2 (Control group) Effect of experimental variable: E = (O 1 O 2 ) Lecture 4: Experiments Experimental designs 35 / 46 Lecture 4: Experiments Experimental designs 36 / 46

Hybrid True Experimental Design Solomon four-group design Increases internal and external validity Consists of two experimental groups and two control groups Costly Reduces effect of uncontrolled variables R O 1 X O 2 R O 3 O 4 R X O 5 R O 6 Extensions of True Experimental Designs Completely Randomised Design Basic form of the true experiment: different treatment levels X 1, X 2,... R O 1 X 1 O 2 R O 3 X 2 O 4 R O 5 X 3 O 6 Example: Students are submitted to three different types of training X 1, X 2, X 3. results results before treatment after R O 1 X 1 O 2 R O 3 X 2 O 4 R O 5 X 3 O 6 R O 7 O 8 Lecture 4: Experiments Experimental designs 37 / 46 Lecture 4: Experiments Experimental designs 38 / 46 Extensions of true experimental design Extensions of true experimental designs Randomized block design Single major extraneous variable: defines groups (blocks) in the subject population Experiment is replicated in each block Example: Bachelor and MBI students (blocks) are submitted to two different types of training X 1, X 2, or X 3. results results before treatment after Bachelor R O 1 X 1 O 2 Students R O 3 X 2 O 4 R O 5 O 6 R O 7 X 1 O 8 MBI R O 9 X 2 O 9 Students R O 11 O 12 Lecture 4: Experiments Experimental designs 39 / 46 Latin Square Design Two major extraneous effects. Each treatment randomly assigned, but once in each row and each column. Presupposes absence of interaction between treatment and blocking factors. Lecture 4: Experiments Experimental designs 40 / 46

Extensions of true experimental designs Latin Square Design Example: Bus company wants to investigate effect of different fare reductions X 1, X 2, or X 3 on passenger numbers. Block factors: Route type Weekday (Observations (O i ) are not pictured in diagram!) Weekday Route Type Midweek Weekend Mon/Fri Urban X 3 X 1 X 2 Suburban X 2 X 3 X 1 Long distance X 1 X 2 X 3 Extensions of true experimental designs Factorial design Two treatments are simultaneously manipulated Example: Fare reduction and choice of bus comfort True randomised: bus routes are assigned randomly to treatment combinations. Comfort class Fare reduction Luxury Comfort Standard 7 cents X 1 Y 1 X 1 Y 2 X 1 Y 3 12 cents X 2 Y 1 X 2 Y 2 X 2 Y 3 17 cents X 3 Y 1 X 3 Y 2 X 3 Y 3 Lecture 4: Experiments Experimental designs 41 / 46 Lecture 4: Experiments Experimental designs 42 / 46 Field experiments Field experiments Outside of controlled laboratory environment Advantages: Heterogeneous group of participants Unawareness of experiment gives more real response No interaction between experimenter and participant to cause influence on response Disadvantages: Less control over research settings Ethical issues Lecture 4: Experiments Experimental designs 43 / 46 Quasi-experiments Do not meet requirements of true experiment: Ability to control experimental settings and manipulate IV s Random assignment to experimental and control groups In case we do not know time or subjects of experiment beforehand. Used for studying effects of well-defined event, like e.g. earthquake, introduction of new law, etc. Non-equivalent control group design No random assignment to groups O 1 X O 2 (Treatment group) O 3 O 4 (Control group) Lecture 4: Experiments Experimental designs 44 / 46

Quasi experiments Summary Time series Repeated observations before and after the treatment Useful when records are kept regularly Useful for ex-post facto analysis Sampling design Experimental design O 1 O 2 O 3 X O 4 O 5 O 6 (Treatment group) O 7 O 8 O 9 O 10 O 11 O 12 (Control group) Lecture 4: Experiments Experimental designs 45 / 46 Lecture 4: Summary 46 / 46