AEC 874 (2007) Field Data Collection & Analysis in Developing Countries. VII. Data Analysis & Project Documentation

Similar documents
CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

, which yields. where z1. and z2

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Statistics Statistical method Variables Value Score Type of Research Level of Measurement...

CHM112 Lab Graphing with Excel Grading Rubric

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

EASTERN ARIZONA COLLEGE Introduction to Statistics

Experiment #3. Graphing with Excel

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Name: Block: Date: Science 10: The Great Geyser Experiment A controlled experiment

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Subject description processes

AP Statistics Notes Unit Two: The Normal Distributions

Distributions, spatial statistics and a Bayesian perspective

Math Foundations 20 Work Plan

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank

Weathering. Title: Chemical and Mechanical Weathering. Grade Level: Subject/Content: Earth and Space Science

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

WRITING THE REPORT. Organizing the report. Title Page. Table of Contents

Checking the resolved resonance region in EXFOR database

Differentiation Applications 1: Related Rates

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

ENSC Discrete Time Systems. Project Outline. Semester

Chapter 3: Cluster Analysis

Pipetting 101 Developed by BSU CityLab

Functional Form and Nonlinearities

Lab 1 The Scientific Method

A Matrix Representation of Panel Data

7 TH GRADE MATH STANDARDS

Physics 2B Chapter 23 Notes - Faraday s Law & Inductors Spring 2018

How do scientists measure trees? What is DBH?

SticiGui Chapter 4: Measures of Location and Spread Philip Stark (2013)

This project has received funding from the European Union s Horizon 2020 research and innovation programme under grant agreement number

Hypothesis Tests for One Population Mean

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart

NAME: Prof. Ruiz. 1. [5 points] What is the difference between simple random sampling and stratified random sampling?

download instant at

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y )

Function notation & composite functions Factoring Dividing polynomials Remainder theorem & factor property

INSTRUMENTAL VARIABLES

If (IV) is (increased, decreased, changed), then (DV) will (increase, decrease, change) because (reason based on prior research).

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

Physics 2010 Motion with Constant Acceleration Experiment 1

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Standard Title: Frequency Response and Frequency Bias Setting. Andrew Dressel Holly Hawkins Maureen Long Scott Miller

Activity Guide Loops and Random Numbers

BASD HIGH SCHOOL FORMAL LAB REPORT

A Quick Overview of the. Framework for K 12 Science Education

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank

Math 10 - Exam 1 Topics

1b) =.215 1c).080/.215 =.372

o o IMPORTANT REMINDERS Reports will be graded largely on their ability to clearly communicate results and important conclusions.

I. Analytical Potential and Field of a Uniform Rod. V E d. The definition of electric potential difference is

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers

Section 5.8 Notes Page Exponential Growth and Decay Models; Newton s Law

We say that y is a linear function of x if. Chapter 13: The Correlation Coefficient and the Regression Line

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

TRAINING GUIDE. Overview of Lucity Spatial

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y=

Part 3 Introduction to statistical classification techniques

INTERNATIONAL CENTRE FOR THEORETICAL PHYSICS

Tutorial 4: Parameter optimization

DEFENSE OCCUPATIONAL AND ENVIRONMENTAL HEALTH READINESS SYSTEM (DOEHRS) ENVIRONMENTAL HEALTH SAMPLING ELECTRONIC DATA DELIVERABLE (EDD) GUIDE

SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST. Mark C. Otto Statistics Research Division, Bureau of the Census Washington, D.C , U.S.A.

AIP Logic Chapter 4 Notes

Kinetic Model Completeness

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM

Building Consensus The Art of Getting to Yes

Pattern Recognition 2014 Support Vector Machines

Writing Guidelines. (Updated: November 25, 2009) Forwards

Petrel TIPS&TRICKS from SCM

Comparing Several Means: ANOVA. Group Means and Grand Mean

Lecture 23: Lattice Models of Materials; Modeling Polymer Solutions

Five Whys How To Do It Better

Lesson Plan. Recode: They will do a graphic organizer to sequence the steps of scientific method.

Comprehensive Exam Guidelines Department of Chemical and Biomolecular Engineering, Ohio University

IN a recent article, Geary [1972] discussed the merit of taking first differences

We can see from the graph above that the intersection is, i.e., [ ).

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 11 (3/11/04) Neutron Diffusion

Advanced Placement BIOLOGY

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

Introduction to Regression

Simple Linear Regression (single variable)

UN Committee of Experts on Environmental Accounting New York, June Peter Cosier Wentworth Group of Concerned Scientists.

Corrections for the textbook answers: Sec 6.1 #8h)covert angle to a positive by adding period #9b) # rad/sec

Study Group Report: Plate-fin Heat Exchangers: AEA Technology

BOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS. Christopher Costello, Andrew Solow, Michael Neubert, and Stephen Polasky

CESAR Science Case The differential rotation of the Sun and its Chromosphere. Introduction. Material that is necessary during the laboratory

Misc. ArcMap Stuff Andrew Phay

How T o Start A n Objective Evaluation O f Your Training Program

Chemistry 20 Lesson 11 Electronegativity, Polarity and Shapes

Transcription:

AEC 874 (2007) Field Data Cllectin & Analysis in Develping Cuntries VII. Data Analysis & Prject Dcumentatin Richard H. Bernsten Agricultural Ecnmics Michigan State University 1 A. Things t Cnsider in Planning Data Analysis 1. Yur Research Prpsal What were yur riginal research bjectives? Are these bjectives still apprpriate, r d yu need t mdify them? 2. Yur Target Audience a) Wh is the audience fr yur analysis? Academic faculty? Plicy makers? Anther client? Multiple audiences? b) What are the expectatins f yur target audience, regarding the type f analysis? 2 1

3. Yur Data a) What type f analysis is pssible, with the data yu have cllected? Sample size Few vs. many cases? Measurement level (see Andrews et. Al.) Nminal categrical? Ordinal Likert scales? Scale-cntinuus numeric data? Data level Husehld vs. variety level? 4. Yur Statistical Expertise a) Nvice? b) Expert? 3 B. Statistics, Data & Analysis 1. Rle f Statistics Summarize data (descriptive statistics) Reveal relatinships (measures f assciatin) 2. Classes f Statistics Univariate - ne variable (e.g,, mean, median, mde) Bivariate - tw variables (e.g, Chi square, crrelatin analysis) Multivariate - several variables (requires gd data) (e.g., regressin, lgit/prbit analysis) 4 2

3. Types f Data (measurement level, SPSS) Nminal data--data values represent categries with n intrinsic rder (e.g., gender, types f incme surces) Ordinal data data values represent categries with sme intrinsic rder (e.g., Likert, rank-rder scales) Scale data data values are cntinuus numeric values n an interval r rati data scale (e.g., age, incme, yield) Nte Yu can transfrm scale data t categrical data, but categrical data can t be transfrmed t cntinuus data implicatins fr data cllectin? 5 4.Types f Analysis (by data type) a) Descriptive Analysis (Fig. 10.3) 1) Nminal/categrical data Frequencies tables--describe data distributins with numbers r percents SPSS utput reprts data categries, numbers f bservatins, & percent (ttals, adjusted, cumulative) May als ask SPSS t reprt data as histgrams, hrizn bar charts, pie charts Limit the number f data categries t <10 If yu have >9, cmbine categries with few cases int ther 6 3

2) Scale/cntinuus numeric data Measures f central tendency Mean is the average case (arithmetic average) Nt valid fr nminal/categrical data Nt usually used fr rdinal data (i.e., can t assume equal distance between items) Very sensitive t distributin f scale data Median is the middle case Use if scale data are asymmetric Use fr rdinal data Mde is the mst cmmn data value Only is an indicatr f central tendency fr nminal/categrical data Nte--Fr nrmally distributed data, mean=median=mde 7 Measures f Dispersin/Spread Minimum is lwest value Maximum is highest value Range is high/lw interval Standard deviatin (SD) indicates percent f cases in a certain ranges (if the data are nrmally distributed) Shape f the Distributin (fr scale data) (Fig. 10.5, 10.9) Skewness shws degree & directin f asymmetry If symmetrical, cefficient = 0 If skewed left, cefficient = psitive (left) If skewed right, cefficient = negative (right) 8 4

Kurtsis measures peakedness f distributin (Figs. 10.6, 10.10) If same as nrmal distributin, cefficient = 0 If very peaked, cefficient = psitive If very flat, cefficient = negative Nte--If skewness r kurtsis value is nt clse t 0 Mean isn t an apprpriate measure f central tendency Standard deviatin isn t an accurate measure f dispersin Prblem n clear definitin f meaning f nt clse t 0 9 b) Analysis f the Relatinship/Assciatin Between Variables Questin--D pairs f variables mve tgether r are they independent? Bivariate analysis des nt require yu t assume/identify a dependant/independent variable Multivariate analysis assesses the relatinship between a dependant & independent variables Dependant variable --variable being affected Independent variable --variable(s) affecting the dependent variable Crrelatin des nt imply nt causatin Statistics that measure assciatin d nt indicate causatin Only thery implies causatin 10 5

Chice f apprpriate statistic t assess relatinships depends n Type f variables nminal, rdinal r scale (cntinuus) Which variable is independent/dependent Cnsideratins in Chsing a Statistical Methd Dependant Variable Nminal r Ordinal Data Interval r Rati Data Nminal r Ordinal Data (Discrete Categrical) Crss-Tabulatin Paired t test ANOVA Independent Variable Interval r Rati Data (Cntinuus, numeric Discriminant, Prbit, Lgit Crrelatin Regressin See Andrews, A Guide fr Selecting Statistical Techniques fr Scial Science Analysis fr details. Overheads 11 C. Strategies fr Analyzing Survey Data 1. Review yur research bjectives, hyptheses, and questinnaire 2. Develp a tentative reprt utline (analytical plan) 3. Use descriptive statistics t explre yur data (e.g., frequencies, mean, median, mde, SD, skewness, kirtsis) Use these results t decide What sub-grup cmparisns are pssible/lk interesting explre (e.g., Is there enugh variability t justify further analysis?) What assciatin can yu assess with the data? 4. Revise yu analytical plan, based n yur new knwledge, regarding the characteristics f the data 5. Finally, use bivariate/multivariate statistics t assess relatinships/assciatins 12 6

D. Strategies & Cnsideratins in Using Statistics Begin yur analysis using descriptive analysis, then lk fr assciatins t explain relatinships 1. Describe the Variables Basic analysis a) Nminal/categrical variables 1) Strategies t cnsider First run frequencies/percents (Example 10.1) if there are very few cases in a categry, cmbine/recde sme categries t ther Be sure t save the riginal variables (with riginal cdes) in an archive file r rename t a new variable befre recding 13 If there are many cases in the categry ther, recde sme f these cases t specific categries (if pssible) Cnsider recding cntinuus data int a few grups (e.g., recde cntinuus variable educatin t: 1=0-11, 2=12, 3=13-15, 4=> 16; r likert scale data (1-5) t 1-2, 3, 4-5) Review the frequency distributin t decide what break pints t use fr regruping cntinuus data t categrical data (e.g., first ½=lw, secnd ½=high; first 1/3=lw, secnd 1/3, medium, third 1/3=high) After recding data, g t the variable view and update variable values/infrmatin fr the new/recded variables 2) Statistics fr nminal/categrical data Mde it the apprpriate statistic fr assessing central tendency 14 7

b) Scale (interval/rati, cntinuus data) variables 1) Strategies t cnsider Run means, mde, median, range, skewness, kurtsis, and standard deviatin Then, lk fr utlyers; assess the nrmal distributin assumptin 2) Statistics If data ARE apprximately nrmally distributed Present mean (mde, median) If data are NOT apprximately nrmally distributed: Recde t categrical data and present the distributin spread in a frequency table 15 2. Lking fr Relatinships--Statistical Inference Def. Making inferences abut the ppulatin parameters frm estimates f sample statistics (requires randm sampling) a) Sme Cncepts 1) Standard Errr f the Estimate Backgrund We sample frm a ppulatin t generate sample statistics t estimate unknwn ppulatin parameters. Different samples will give different estimates. The theretical distributin f all pssible values f a statistic btained frm a ppulatin is the sampling distributin f the statistic. The mean f the sampling distributin is the expected value f the statistic. The standard deviatin is the standard errr. When we estimate the SE frm a single sample SD SE x = --------- \/ N 16 8

SE f mean (a SPSS descriptive statistics ptin) indicates hw clse/far the sample mean is t ppulatin mean Fr means f interval/rati data & percentages, reprt the SE and/r the margin f errr, which is a multiple f the SE At 99% CI, ME=2.57 SE At 95% CI, ME=2.00 SE At 90% CI, ME=1.65 SE Sample Size and Data Distributin A Cautin If the sample is large, the sampling distributin f the sample mean is apprximately nrmal, even if ppulatin was nt nrmally distributed. If the ppulatin is small and nt nrmal, the sampling distributin f mean wn t be nrmal, limiting statistical inference In such cases, yu shuld use nn-parametric statistics t analyze the data This is why survey researchers ften use the chi square 17 statistic t analyze survey data 2) Cnfidence Interval (CI) Def. A range arund sample mean, based n the SE (i.e., 95% CI is range +/- 2 SEs) SE and CI indicate reliability f a statistic b) Statistical Significance These statistics all shw the degree f assciatin & statistical significance (nn-significance) Significance indicates the prbability that a relatinship exists in sample, if it desn t exist in ppulatin (e.g., 1% prbability that yu accept a false H as true) Alpha/critical level f prbability fr acceptance is researchers/spnsr determined 18 9

Traditinal alpha levels f 99%/95% are cnventins, nt abslutes (Fisher, agricultural research). Must cnsider the cnsequence f accepting a false result as true Example A traditinal variety yields 500 kg/ah & a mdern variety yields 800 kg/ha, but the difference is nly significant at the 80% level. Each variety cst the same price. Wuld yu plant the MV r the TV? It s ften mre infrmative t reprt the level at which yur results are significant, rather than simply saying they are nn-significant (e.g., The means are significantly different at the 88% level) Lack f statistical significance may be due t the fact that N relatinship exists Nn-sampling errr was large, s data are nt accurate The sample size is small, s the SE is large 19 Statistical significance des NOT indicate the imprtance f yur result!!! The imprtance f a result is a functin f the size f the cefficient & the meaning that the variables/relatinships imply. Statistical results are either significant r nn-significance (NOT insignificant) A result may be statistically significant, but still insignificant (i.e., very small, and thus nt imprtant) Even if the differences in the numerical values are large (e.g. mean yields f 500 kg/ha vs. 1,000 kg/ha), if the relatinship is nn-significant, this implies that the values are essentially the same. S, dn t emphasize the magnitude f the nn-significant difference when reprting yur results. 20 10

c) Measures f Assciatin Used t Analyze Survey Data 1) Crsstabulatin (Chi square analysis, X 2 ) Objective T test if the distributin f ne variable differs significantly fr values f ther variable Data Requirements: Bth variables must be categrical (I.e., nminal, rdinal) But yu can cnvert scale data variables t categrical variables and then use Chi square analysis Dn t need t assume the data are nrmally distributed Dn t need t identify a dependent/independent variable Mst cmmn measure f assciatin fr survey variables (Why?) 21 A Wrd f Cautin The X 2 statistic is invalid if the expected value is <5. Hwever SPSS will still reprt a X 2 value even if it is meaningless!!! In a crss-tab table, the cell with the smallest expected frequency (nt the actual frequency) is the ne n the rw with the smallest rw ttal & in the clumn with the smallest clumn ttal (Table 10.3) T estimate the expected cell frequency, divide the smallest rw ttal in the crss-tab table by N & multiply this number by the smallest clumn ttal. Evaluate: < 5?) Suggestins (Table example) (Example 11.4) The variable yu chse as the rw/clumn variable nt critical It s cnfusing t interpret the results if yu request bth clumn & rw percents, s request nly clumn percents 22 11

If N is small (< 200?), cnstruct crss-tab tables with 3 r fewer categries/variable Why? If the N is very small (< 100?), use the results in the crss-tab table t estimate the expected frequency Why? If the expected value < 5, recde the data int fewer equal size grups t increase the expected value Statistics SPSS reprts the X 2 statistic (larger is better) & the prbability level (smaller is better) (Example 11.3) In the text f an article, reprt the directin f the bserved relatinship & prbability level (in parentheses) [e.g., X 2 analysis indicates a significant (95% level) negative relatinship between age & educatin] In the table, reprt crss-tab results, X 2 statistic & the prbability level 23 2) Analysis f Variance (ne-way) Objective Determine if the mean values f the dependant variable are fr each categry f the independent variable, significantly different (t-test is a special case) Data Requirements Must identify an independent & dependant variables Independent variable--categrical data with 2 r mre categries (e.g., 2 r varieties) (Fig. 11.5) Dependent variable--scale (cntinuus) data (e.g., yield f several varieties) Each case f the dependant variable must be independent f the ther Cautin Spread f data pints (I.e., variance) in independent variable must be similar fr each data categry & nrmally distributed 24 12

Suggestins Test fr hmgeneity f variances Dn t use ANOVA, if variances are very different r sample sizes f grups differ greatly Statistics (Example 11.5) F-test evaluates significance (i.e., HO that all means are equal) Multiple cmparisns test (Shaffe) indicates if individual means are different (pairwise cmparisns) In the text f an article, reprt directin f the relatinship, significantly different means & F-test statistic [e.g., ANOVA indicates the mean yield f variety A (845 kg/ha) & B (933 kg/ha) are significantly (95% level) higher than the yield f variety C (534 kg/ha), with a F-value f 6.75] In tables, reprt grup means, F-test (prbability level fr the ANOVA) & the multiple cmparisn test (Scheffe) results 25 3) Crrelatin Analysis Objective Measures the degree that 2 cntinuus variables mve tgether frm ne case t anther Data Requirements Bth variables must be scale (cntinuus) r rdinal data Dn t need t identify a dependant/independent variable Suggestins Run crrelatins t explre ptential relatinships Statistics Different types f data require different statistics Fr interval/rati scale data, use Pearsn s prductmment crrelatin Fr rdinal data, use Spearman rank crrelatin 26 13

Crrelatin cefficient (r) indicates strength f relatinship & ranges frm 0 t +/-1 (Example 11.6) Sign indicates directin f relatinship (Fig. 11.7) Sign psitive (+), direct Sign negative (-), inverse Cefficient f determinatin (r 2 ) indicates the percent f shared variance In text f an article, reprt the directin f the relatinship (psitive/negative), crrelatin cefficients (r) & r 2 [e.g., Crrelatin analysis indicated that yield & N-fertilizer rates are psitively crrelated (r =0.79), with a R 2 f 0.62] In the table, reprt the crrelatin cefficient (r), signs, and the prbability level (r 2 ) May present several variables/crrelatins in matrix frmat, which is ften included as an appendix 27 4) Regressin Analysis Objective Measures the relatinship between cntinuus independent & dependent variables (Fig. 11.9) Data Requirements Must identify 1 dependant variable, 1 r mre independent variables Independent & dependant variables are usually scale data But can use dummy independent variables (0,1) in multiple regressin Linear mdels are mst cmmn, but can use ther functinal frms, depending n yur assessment f the theretical relatinship (e.g., lg, quadratic mdels) 28 14

Suggestins The scatter f plts indicates the data distributin, which must be well-distributed ver the range f data values (Fig. 11.10) Print ut scatter plts f dependent/independent variables (e.g., yield, fertilizer) & assess the scatter plts t find utlyers Check fr utlyers befre running a regressin & cnsider drpping cases with extreme/impssible values (i.e., small plts > measurement errr) Use thery (and pssibly scatter plts) t specify mdel & functinal frm, but avid stepwise prcedure (data mining) Thery suggests that yield increases with higher N applicatin & then declines suggesting a quadratic mdel But farm-level data seldm includes extremely high N rates justifying a linear mdel 29 Review the crrelatin matrix t identify highly crrelated (>90%) variables (mulicllinearity) in the mdel. If any variables are highly crrelated, drp ne r mre f these variables (Example 11.6) Missing data fr any variable will eliminate that case frm the mdel, which is especially a prblem in multiple regressin The criteria fr deciding if the mdel is a gd fit (R 2 ) fr the data is a functin f the type f relatinship scial analysis ften reprts data with a lw R 2 Avid including dminant independent variables in yur mdel (e.g., Prductin = harvested area, fertilizer, labr, etc.). Can use standardized cefficient mdel t estimate the percent cntributin f each independent variables 30 15

Statistics (Example 11.7) The cnstant shws the value f the dependant variable when the independent variable(s) equal(s) zer The regressin cefficient indicates the change in the dependant variable that is assciated with a 1 unit change in the independent variable Significance f a cefficient is estimated by dividing the cefficient by its SE, and then cmparing this value t the t-distributin value R 2 indicates strength f f the influence f the independent variables n the dependant variables--ranges frm 0-1 (i.e., nne/cmplete); Evaluate R 2 bar, which adjusts fr degrees f freedm Why? F-value indicates the prbability that all betas are equal 31 In the text f an article, reprt the directin f the relatinship, beta cefficient, its significance, R 2 & the F-value [e.g., Regressin analysis indicated that the nitrgen applicatin rate (0.44) & weeding days (0.22) are significantly assciated (95% level) with yield. The mdel had a R 2 value f 0.65 & a significant (99%) F-value. Als, list & discuss nn-significant cefficients Why are they nn-significant? In tables, reprt all variables, cefficients, SE (in parentheses belw cefficient), significance levels (***=.01,** =.05,** =.10*), F-value & R 2 bar Nte: Many relatinships that are significant in bivariate relatinships, will be nn-significant in a multivariate mdel Why? 32 16

5. Lgit & Prbit Analysis Objective Measures the degree & directin f the relatinship between a cntinuus independent variables & a categry f a dependant variable Data Requirements Dependant variable is categrical (e.g., adpter/nn-adpter) Independent variable is scale (cntinuus) data Statistics Number f cases crrectly classified, cntributin f each independent variable t predictin (cefficients), significance f each independent variable 33 E. Respnsibility fr Analysis Primary respnsibility fr analysis lies with the researcher(s) wh Designed the prject Identified the research issues Develped the questinnaires Supervised data cllectin & therefre Knw the analytical needs & limitatins f the data 34 17

F. Dcumenting the Prject Purpse Prvide a permanent recrd f the prject Prvide a reference fr yur analysis Prvide a reference fr ther users 1. Archive Prject Materials & Leave at the Research Lcatin Assemble questinnaires (fr future reference), pst-cding sheets, etc. Make a cpy f the data n CDs Make a cpy f the Prject Dcumentatin Categrize, label & stre all material in a safe place that is prtected frm heat (sun), magnetic interference, mld, etc. 35 2. Prject Dcumentatin (bund vlume) Prject Dcumentatin (summary) (e.g., prject title, spnsrs, gegraphical cverage, dates, prject verview, publicatins) Descriptin f Survey Methdlgy (e.g., verview f research issues, survey lcatins, sampling methd/limitatins, enumeratr selectin/training, mdule design prcess, survey instruments, data entry) Survey Dcumentatin (fr each mdule) (e.g., purpse, tpics cvered, sample size, data level, unit f bservatin, number f runds, survey areas & dates, time reference fr data (seasn, mnths), base fine name, cpies f mdules (all languages), names f enumeratrs & respndents by survey lcatin) 36 18

SPSS Systems/Data File Summaries (all SPSS files) (e.g., name f all base files (mdule name), descriptin f data, data limitatins, file infrmatin printuts, histry f base file mdificatins/transfrmatins including names f new files created) 3. Suggestins fr Dcumenting Mdified Systems Files Failure t updated files/variable descriptins is a majr prblem a) Suggestins fr Recded/Cmputed Variables Dn t recde the riginal variable. First create a new variable frm the data and recde these data Name recded/cmputed variable with a name that begins with R/C t indicate it was recded/cmputed Immediately create value labels /etc. fr all new variable Describe variable transfrmatins in the variable label [i.e., Yield (yield=prd/area)] 37 b) Keep a Permanent Recrd (file) f Data Transfrmatins Paste SPSS cmmands int the Syntax Editr, then run them frm the editr. Save this file! At the end f the first SPSS sessin, cpy the syntax that yu want t save/archive int a wrd prcessing file and at the end f each subsequent SPSS sessin, add the new syntax cmmands t a wrd prcessing file c) Peridically Print ut the File Infrmatin After making transfrmatins, print ut the new file infrmatin d) Cleaning Up Yur Current Wrk File After transfrming a variable, drp ld variable frm the current versin f the file Be sure t save the riginal variable in an earlier versin f the file 38 19

Return t p. 6 39 Return t p. 8 40 20

Return t p. 9 41 Return t p. 13 42 21

Return t p. 22 43 Return t p. 22 44 22

Return t p. 23 45 Return t p. 24 46 23

Return t p. 25 47 Return t p. 27 48 24

Return t p. 27 49 Return t p. 28 50 25

Return t p. 29 51 Return t p. 31 52 26