Psych Jan. 5, 2005
|
|
- Spencer Hampton
- 5 years ago
- Views:
Transcription
1 Psych Wee 1: Introductory Notes on Variables and Probability Distributions (1/5/05) (Reading: Aron & Aron, Chaps. 1, 14, and this Handout.) All handouts are available outside Mija s office. Lecture notes and overheads are available on the website. 1. Data collection in class Let us collect some data, which we can use to illustrate a few concepts. Try to imagine the development of a hypothetical person from the age of 1 year up to age 41. In the Table below please write down (i) the height (in inches) at ages 1, 11,..., 41, and (ii) the lieableness at these ages. Lieableness is measured on a 100-point scale, where 0 represents a most obnoxious person and 100 represents a most pleasing person. (Ignore columns S and P.) Height (inches) Lieableness At age 41 Age (years) Age S P (iii) In column S write male or female. (iv) Consider a 3-point scale: 1 = thin, 2 = medium, 3 = non thin. In Column P write the appropriate number for your imagined person. (v) Using the same 1-3 scale as in (iv), state your own body type, P*. These data can be used to answer many questions about imaginary (or real) persons, as now illustrated. (a) What is the average height of 11 year olds? How does Height change with Age? We can compare the answers provided by our data with the answers provided by official data, such as, census data. Indeed, the graph on this page gives the height of British boys and girls from around Our class data come from a different country and a generation later. Therefore, we might expect our data to differ (In what ways, and why?) but, at least, such a comparison between our data and a normative data set is a useful starting point in understanding our data. (b) How many students answered Qu (iii) above? Why might this number differ from the total number of persons in the room? (c) Is the lieableness of females the same as that of males? Is lieableness related to P or to P*? etc. The individuals whose data we have constitute a sample. Occasionally, our questions concern only the individuals in the sample. Typically, however, our questions concern the total set of individuals, of which the sample is only a subset. This total set is called the population. We mae inferences about a population from the sample data. Later on we will use the symbols, e.g., M, x, X, y 1, etc., to denote sample averages or means, and the symbol µ to denote a population mean. The distinction between sample and population is an important one. Questions about data usually involve variables, e.g., height and age. A variable is something which can vary (sic!) from person to person in a population. Each person is characterised by a particular value (e.g., tall, 60
2 Psych ins.; young, 10 years) of the variable. We must learn to distinguish between words or symbols that refer to variables, on the one hand, and words or symbols that refer to values of these variables, on the other. 2. Types of variables (A&A, Chap. 1) 2.1. Qualitative and quantitative variables. In determining what are valid uses for numbers in a study, the first question that has to be answered is From what type of scale do the measurements come? (i) The variable, S, in the Table on p. 1 is a nominal or qualitative variable with values Female and Male. These 2 values differ in name and in quality, not in quantity. In other words, even if we coded our data as Female = 1 and Male = 2, we would not be saying that a 2 has more of anything than a 1 ; simply that a 2 is different from a 1. (ii) P is an ordinal variable with values, 1, 2, and 3, that connote an increasing quantity on a size dimension. It is not the case that the difference in size between a 2 and a 1 is the same as that between a 3 and a 2 ; all that is implied is that a 3 has more size than a 2, which has more than a 1. (iii) Lieableness is on an interval scale (not unquestionably), meaning, e.g., that the difference in lieableness between persons with 10 and 20 units is the same as that between persons with 70 and 80 units. There is no meaningful zero when a variable is measured on an interval scale (or on a nominal or ordinal scale). (iv) Age and height are on ratio scales, meaning, e.g., that a person aged 20 is twice as old as one aged 10, etc. On a ratio scale, there is a meaningful zero (e.g., zero height is meaningful, whereas zero lieableness is hard to interpret). (v) Finally, the number of students in Psych 10 is on an absolute scale. The absolute scale has no physical units (e.g., years or inches); the values on this scale are pure numbers. In statistical practice, the ey distinction is that between nominal or qualitative variables, on the one hand, and quantitative variables (i.e., ordinal, interval, ratio and absolute), on the other. We will learn the statistical methods that are appropriate to each type of variable (e.g., chi-square methods for qualitative variables, and t-tests, correlation, etc. for quantitative variables) Discrete and continuous variables. A second question concerning type of scale is whether the variable is discrete (taing on values that are not arbitrarily close together) or continuous. A crude chec is to as: Can the variable tae on a value in its range such as 1.335? If no, the variable is probably discrete; if yes, the variable is probably continuous. The variables, S~ and~ P, are discrete; and age and height are continuous, in principle. In practice, it is often convenient to regard continuous variables as discrete (e.g., height, which is continuous, is usually measured to the nearest inch, which is a discrete scale); and to regard discrete variables as continuous (e.g., number of correct answers, which is discrete, is often assumed to be approximately Normally distributed, and the Normal distribution refers to continuous variables) Random variables. We will be dealing often with random variables. Notice that the values of Age were determined beforehand in the above study, whereas the values of height and lieableness were not - you gave the values for height and lieableness, and no-one could have set them or predicted them exactly beforehand. Thus Age is a non-random variable (also called a fixed factor ), whereas Height is a random variable. The word random connotes our ignorance about, our inability to predict, etc., but we shall find that in most cases we do have information about the behavior of random variables -- information contained in the probability distribution of that random variable Probability distributions; percentiles. Let us consider the quantitative variable, height of 14-yearolds. Through extensive observation, we may be able to find the values, x P, of height, such that a proportion, P, of 14-year-olds is shorter than x P. This information, {P, x P }, for P =.05,.25,.50,...,.95, is an example of a probability distribution. X.05 is called the 5 th percentile, x.50 is called the 50 th percentile (also called the median), etc. The accompanying graph shows the distribution of height of U.S. boys between ages 2-18 around We now illustrate the sort of interesting information that can be extracted from this probability distribution. (i) The median increases from about 34 ins. at age 2 to about 70 ins. at age 18.
3 Psych (ii) The spread or range or variability or dispersion of the distribution increases with age. There are many ways to measure variability. In this case, a simple index of variability is the interval between the 5 th and 95 th percentile ; this interval increases from about 4 ins. at age 2 to about 9 ins. at age Small and large values of a variable. In the Table below we show the distribution of boys height at certain ages; these data were read off of the graph on this page. Probability- (or p-) values Age Percentiles (yrs) When would we say that a person has a small value of height (i.e., is short ) or a large value of height (i.e., is tall )? Ans. It depends on the age of the person and on the probability distribution of height for that age. For example, 44.3 ins. is tall for a 4-year-old, but short for an 8-year-old. (Why?) In general, a small value of a random variable is a value such that most values of the variable are larger than it; and a large value of a random variable is a value such that few values of the variable are larger than it. Let us adopt the convention that a small value is a value that is less than or equal to the 5 th percentile of the distribution, and a large value is a value that is greater than or equal to the 95 th percentile of the distribution. Exercises. (a) Is 54.1 ins. short for a 12-yr-old? (b) Is 62.7 ins. tall for a 12-yr-old; short for a 16-yr-old? (c) Is 54.0 ins. tall for an 8-yrold? 2.6. Validity and reliability. The validity of a measure, such as Lieableness, tells us how well that measure reflects the concept it is supposed to be measuring. Related to this is What is the most valid behavioral manifestation of a concept? For example, what do you understand by lieableness? Also, a measure is reliable if repeated values of it do not vary by much (assuming the underlying concept stays constant). Note that if a measure is very unreliable one would suspect its validity Independent and dependent variables. If two variables X and Y are causally related, it is sometimes possible to say that X causes Y. In such a case, Y is said to be the dependent variable (since it depends on X), and X is the independent variable The summation notation (A&A, pp ). A convenient shorthand exists for expressing sums. Given numbers, x 1, x 2,..., x, we express their sum, S, as S = x 1 + x x = x i. Exercises. Suppose x 1 = 2, x 2 = 4, x 3 = 1 and x 4 = 2. Show (after brushing up on your high-school Algebra!) that (i) x i = 7, (ii) 3 x i = 18, (iii) x i x i = 0, (iv) x i = 25; i=2 i=2
4 Psych n 1 (v) a = a + a + a = 3a, and (vi) = 1 n n + 1 n n = n 1 n = Probability distribution of a discrete variable (cf. A&A, Chap. 1) Referring to the item about P in the data collected in class, let us count how many persons responded P = 1, 3 2 or 3. Let f i be the frequency (or count) of persons who responded P = i, i = 1, 2, 3; let N = f 1 + f 2 + f 3 = f i be the total number of persons who responded; and let rf i = f i responding P = i. (1) (2) (3) P f i rf i rf i rf i N be the relative frequency or proportion of persons The list of possible values of the variable, together with the frequency (or relative frequency) of each value is called the frequency (or relative frequency) histogram of the variable. Histogram is a synonym for distribution, and relative frequency is a synonym for proportion, which has almost the same meaning as probability. If we could observe an entire population (instead of just N persons), then the relative frequency of value i would be the probability of observing value i, denoted by p i. The {p i } form the probability distribution of the variable, and they are estimated by the {rf i }. Three examples, (1), (2) and (3), of distributions are given in the above Table. The histogram or distribution is an excellent tool for summarising the mass of data from a sample Descriptive statistics obtainable from an observed distribution Often we wish to summarise the information contained in a distribution (which is itself a summary of the raw data), especially when the number of values of the variable is large. The most important summaries of a distribution are (i) a measure of location (Where on the scale are most persons located?), and (ii) a measure of variability or dispersion (How spread apart are the persons on the scale?). Location (Chap. 2). We have already mentioned the mean or average as a measure of location ; however, we can calculate the mean only if the variable is measured on an interval, ratio or absolute scale. We have also mentioned the median (as the middle score), but the variable has to be on at least an ordinal scale for us to be able to calculate the median. A third measure of location is the mode, defined as the most frequently occurring value of the variable. This index can be defined for qualitative and for quantitative variables. We can now compare any two distributions with respect to location. Samples (1) and (2) in the Table above have different modes (the values, 2 and 1, respectively), and (1) and (3) have the same mode. Dispersion. We have mentioned, as a measure of variability or dispersion, the interval between the 5 th and 95 th percentiles of a distribution; but percentiles can be calculated only for variables on an ordinal (or higher) scale, and not for nominal variables. For nominal variables, the relative frequency at the mode is a good measure of concentration, which is the inverse of variability. If the rf at the mode is low (high), the dispersion is relatively high (low). ( Percentile and rf at mode are not discussed in A&A.) In comparing samples (1)-(3) above, we see that sample (1) is the least dispersed, and sample (2) is the most dispersed (because 0.78 > 0.6 > 0.45). To sum up, samples (1) and (3) have the same mode, but (1) is less dispersed than (3); (2) differs from (1) in both location and dispersion Inferential statistics related to a univariate distribution (Chap. 14) Suppose we have a sample that gives us the frequency distribution of a variable. We might wish to use these data to see if the population distribution from which we obtained our sample is the same as, or different from,
5 Psych another nown population distribution. Usually, it is not possible to observe the entire population, and we can only observe small subsets nown as samples. This process of using sample data (a particular subset) to infer something about a population (the entire set) is called statistical inference. The sample frequency distribution is {f i }, i = 1, 2,..., ; where f i = N. Let us denote the nown probability distribution by {p i }; i.e., p i is the proportion of i s in the entire nown population, and p i = 1. If our sample was drawn from a population with distribution {p i }, then we should find that f i Np i. But how close do the f i have to be to the Np i for us to conclude that the sample was drawn from the nown population? We answer this question in the following stylised way. First, we state the information about the nown population as a null hypothesis, H 0. For example, H 0 : p 1 = 0.22, p 2 = 0.68, p 3 = The question to be answered is whether H 0 provides a good fit to the observed frequencies in sample (1), given on p. 4. Let us denote the observed frequencies as O i (instead of f i ), and recall that N = O i is the total number of observations. Second, we calculate what the frequencies are expected to be if H 0 is true, i.e., if our sample were drawn from a population as given in H 0. These expected frequencies are denoted by E i, and are given by the formula, E i = Np i, which implies E i = Np i = N p i = N. i= 1 Third, we calculate an index, nown as chi-square or χ 2, of the distance between the 2 sets of frequencies, {O i } and {E i }: ( ) 2 O χ 2 = i E i (1). E i A large value of χ 2 would indicate a poor fit; in this case, we would reject H 0, and conclude that our sample was drawn from a population different from that described in the null hypothesis. A small value of χ 2 would indicate a good fit; in this case, we would retain H 0, and conclude that our sample was drawn from a population that is no different from that described in the null hypothesis. But, what is a large value of χ 2? Fourth, we need to calculate what a large value of χ 2 is. It is clear that, even if H 0 is true, the larger (the number of values) is, the larger χ 2 will tend to be. Therefore, the definition of large depends on. (Recall that, for a similar reason, the definition of a large height depends on the age, the analogue of, of the person.) More precisely, the definition of large depends on the number of independent terms in the sum that is χ 2. There are terms in the sum, but only -1 of them are independent, i.e., are free to tae on any value. Once -1 of the terms in the sum are nown, the th term is nown because the terms, O I E i, satisfy the constraint, ( O i E i ) = O i E i = N N = 0. The degrees of freedom (df) of χ 2 is the number of independent terms in the sum that defines χ 2 ; for this problem the df = -1. Given the df, we can consult the Statistical Tables for the probability distribution of χ 2 with the stated df to get the 95 th percentile. Recall that, earlier, we adopted the convention that a large value of a quantitative variable is any value greater than the 95 th percentile. We refer to this 95 th percentile as the critical value of χ 2. Below is the Table for df = 1,..., 8; the percentiles in this Table are derived from mathematical arguments beyond our scope.
6 Psych Table 10 Distribution of Chi-Square for Given Probability Levels Probability df Fifth, if the χ 2 goodness-of-fit index, as computed using the formula in Eq. (1) above, is greater than the critical value obtained from the Statistical Tables, we reject H 0. Otherwise, we retain H 0. Please go to the accompanying Problem Set Handout for Exercises on χ Test of contingency between 2 discrete variables (Chap. 14) Is there a relationship or contingency between the gender of a student (SG) and the gender of the target person imagined by the student (TG)? (TG refers to the imaginary person in the Class Project from the first lecture.) We answer this question by stating the null hypothesis: H 0 : There is no relationship or contingency between SG and TG. Given the data, should we reject or retain H 0? In class we will code both variables using the values, F and M, and will arrange the data in a contingency table (or bivariate frequency distribution) with 2 rows and 2 columns (i.e., a 2x2 table). Let us now use the data from a previous class. SG TG M F Total M (9.9) (13.1) F (44.1) (58.9) Total Among 23 males in this previous class, 18 (78%) imagined a Male target and 5 imagined a Female target. Among 103 females in this class, 36 (35%) imagined a Male target and 67 imagined a Female target. The numbers in parentheses are expected frequencies, to be defined below. There are two interesting sets of frequencies in the above contingency table. (a) One is the set of marginal frequencies, i.e., the frequencies in the margins of the table: (ai) the Row marginal frequencies, 23 and 103, give the (univariate) frequency distribution of the variable SG; this distribution tells us that there are 4.5 times as many F s as M s in the class. (aii) the Column marginal frequencies, 54 and 72, give the (univariate) frequency distribution of the variable TG; this distribution is not of primary interest it depends on the number of F s and M s in the class, and on their tendency to imagine female vs. male. (b) The other set of interesting frequencies are the cell frequencies (also called the joint frequencies), 18, 5, 36 and 67; it is these frequencies that are most relevant to deciding if there is a contingency between SG and TG. It is clear from this table (without doing a statistical test) that there is a relationship or contingency between SG and TG - female students tend to imagine the target as female, and male students tend to imagine the target as male. The pattern that would be consistent with no relation between SG and TG would be (i) 9.9 out of 23 males (43%) imagining Male, and (ii) 44.1 out of 103 (43%) females imagining Male. These expected frequencies are shown in parentheses in the table above. Note that the marginal frequencies derived from the expected frequencies
7 Psych are the same as those based on the observed joint frequencies. Let cell(i, j) be the cell at Row i and Column j; let R i be the total frequency in Row i, and let C j be the total frequency in Column j. Let the expected frequency in cell(i, j) be E ij. If there is no relationship between the Row and Column variables, then, in any row, the relative frequencies in any row should be the same as the relative frequencies of the Column totals; that is, for each i, E ij R i = C j N, implying that E ij = R i C j N. To test the null hypothesis of no contingency, we need to assess the goodness-of-fit between the observed and expected joint frequencies. We again use the chi-square goodness-of-fit index: ( ) 2 O χ 2 ij E ij =. i,j E ij However, for this problem, the degrees of freedom of χ 2 is now (r-1)(c-1), where r and c are the number of rows and columns, respectively, of the contingency table. (The justification for this formula will be given in the next Handout.) In the present data set, r = c = 2, implying that the df = 1. The observed χ 2 = ( ) 2 + ( ) 2 + ( ) 2 + ( ) 2 = With 1 df, the critical value of χ 2 is 3.84 (from the Statistical Tables). Since 14.4 > 3.84, the observed χ 2 is large, and we must reject H 0 and conclude that there is a relationship between SG and TG.
Inferential statistics
Inferential statistics Inference involves making a Generalization about a larger group of individuals on the basis of a subset or sample. Ahmed-Refat-ZU Null and alternative hypotheses In hypotheses testing,
More informationLecture 41 Sections Mon, Apr 7, 2008
Lecture 41 Sections 14.1-14.3 Hampden-Sydney College Mon, Apr 7, 2008 Outline 1 2 3 4 5 one-proportion test that we just studied allows us to test a hypothesis concerning one proportion, or two categories,
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationFrequency Distribution Cross-Tabulation
Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape
More informationLecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series
Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 13 Nonparametric Statistics 13-1 Overview 13-2 Sign Test 13-3 Wilcoxon Signed-Ranks
More informationLecture Slides. Section 13-1 Overview. Elementary Statistics Tenth Edition. Chapter 13 Nonparametric Statistics. by Mario F.
Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 13 Nonparametric Statistics 13-1 Overview 13-2 Sign Test 13-3 Wilcoxon Signed-Ranks
More informationNominal Data. Parametric Statistics. Nonparametric Statistics. Parametric vs Nonparametric Tests. Greg C Elvers
Nominal Data Greg C Elvers 1 Parametric Statistics The inferential statistics that we have discussed, such as t and ANOVA, are parametric statistics A parametric statistic is a statistic that makes certain
More informationLecture 28 Chi-Square Analysis
Lecture 28 STAT 225 Introduction to Probability Models April 23, 2014 Whitney Huang Purdue University 28.1 χ 2 test for For a given contingency table, we want to test if two have a relationship or not
More informationProbability and Inference. POLI 205 Doing Research in Politics. Populations and Samples. Probability. Fall 2015
Fall 2015 Population versus Sample Population: data for every possible relevant case Sample: a subset of cases that is drawn from an underlying population Inference Parameters and Statistics A parameter
More informationObjective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.
Chapter 3 Numerically Summarizing Data Chapter 3.1 Measures of Central Tendency Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode. A1. Mean The
More informationPsych 230. Psychological Measurement and Statistics
Psych 230 Psychological Measurement and Statistics Pedro Wolf December 9, 2009 This Time. Non-Parametric statistics Chi-Square test One-way Two-way Statistical Testing 1. Decide which test to use 2. State
More informationIntroduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution
Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis
More informationSTP 226 ELEMENTARY STATISTICS NOTES
STP 226 ELEMENTARY STATISTICS NOTES PART 1V INFERENTIAL STATISTICS CHAPTER 12 CHI SQUARE PROCEDURES 12.1 The Chi Square Distribution A variable has a chi square distribution if the shape of its distribution
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationLecture 41 Sections Wed, Nov 12, 2008
Lecture 41 Sections 14.1-14.3 Hampden-Sydney College Wed, Nov 12, 2008 Outline 1 2 3 4 5 6 7 one-proportion test that we just studied allows us to test a hypothesis concerning one proportion, or two categories,
More informationMean Vector Inferences
Mean Vector Inferences Lecture 5 September 21, 2005 Multivariate Analysis Lecture #5-9/21/2005 Slide 1 of 34 Today s Lecture Inferences about a Mean Vector (Chapter 5). Univariate versions of mean vector
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationUsing SPSS for One Way Analysis of Variance
Using SPSS for One Way Analysis of Variance This tutorial will show you how to use SPSS version 12 to perform a one-way, between- subjects analysis of variance and related post-hoc tests. This tutorial
More informationOne-Way ANOVA. Some examples of when ANOVA would be appropriate include:
One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement
More informationA is one of the categories into which qualitative data can be classified.
Chapter 2 Methods for Describing Sets of Data 2.1 Describing qualitative data Recall qualitative data: non-numerical or categorical data Basic definitions: A is one of the categories into which qualitative
More informationMATH 10 INTRODUCTORY STATISTICS
MATH 10 INTRODUCTORY STATISTICS Tommy Khoo Your friendly neighbourhood graduate student. Week 1 Chapter 1 Introduction What is Statistics? Why do you need to know Statistics? Technical lingo and concepts:
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationIntroduction to Matrix Algebra and the Multivariate Normal Distribution
Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Structural Equation Modeling Lecture #2 January 18, 2012 ERSH 8750: Lecture 2 Motivation for Learning the Multivariate
More informationThe goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.
The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions. A common problem of this type is concerned with determining
More informationContingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878
Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each
More informationFinal Exam - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your
More informationUnits. Exploratory Data Analysis. Variables. Student Data
Units Exploratory Data Analysis Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 13th September 2005 A unit is an object that can be measured, such as
More informationADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes
We Make Stats Easy. Chapter 4 Tutorial Length 1 Hour 45 Minutes Tutorials Past Tests Chapter 4 Page 1 Chapter 4 Note The following topics will be covered in this chapter: Measures of central location Measures
More informationLecture 9. Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests
Lecture 9 Selected material from: Ch. 12 The analysis of categorical data and goodness of fit tests Univariate categorical data Univariate categorical data are best summarized in a one way frequency table.
More informationStatistics 3858 : Contingency Tables
Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson
More informationIs there a connection between gender, maths grade, hair colour and eye colour? Contents
5 Sample project This Maths Studies project has been graded by a moderator. As you read through it, you will see comments from the moderator in boxes like this: At the end of the sample project is a summary
More informationSESSION 5 Descriptive Statistics
SESSION 5 Descriptive Statistics Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple
More informationHypothesis Testing hypothesis testing approach
Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we
More informationTOPIC: Descriptive Statistics Single Variable
TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency
More informationWe know from STAT.1030 that the relevant test statistic for equality of proportions is:
2. Chi 2 -tests for equality of proportions Introduction: Two Samples Consider comparing the sample proportions p 1 and p 2 in independent random samples of size n 1 and n 2 out of two populations which
More informationParametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous
More information9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.
Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences
More informationESP 178 Applied Research Methods. 2/23: Quantitative Analysis
ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and
More informationContents. Acknowledgments. xix
Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables
More informationDover- Sherborn High School Mathematics Curriculum Probability and Statistics
Mathematics Curriculum A. DESCRIPTION This is a full year courses designed to introduce students to the basic elements of statistics and probability. Emphasis is placed on understanding terminology and
More informationWhat is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.
What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,
More informationStatistics for Managers Using Microsoft Excel
Statistics for Managers Using Microsoft Excel 7 th Edition Chapter 1 Chi-Square Tests and Nonparametric Tests Statistics for Managers Using Microsoft Excel 7e Copyright 014 Pearson Education, Inc. Chap
More informationSTAT Section 3.4: The Sign Test. The sign test, as we will typically use it, is a method for analyzing paired data.
STAT 518 --- Section 3.4: The Sign Test The sign test, as we will typically use it, is a method for analyzing paired data. Examples of Paired Data: Similar subjects are paired off and one of two treatments
More informationLecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti
Lecture 2: Categorical Variable A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti 1 Categorical Variable Categorical variable is qualitative
More informationAP Final Review II Exploring Data (20% 30%)
AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure
More information16.400/453J Human Factors Engineering. Design of Experiments II
J Human Factors Engineering Design of Experiments II Review Experiment Design and Descriptive Statistics Research question, independent and dependent variables, histograms, box plots, etc. Inferential
More informationStat 135 Fall 2013 FINAL EXAM December 18, 2013
Stat 135 Fall 2013 FINAL EXAM December 18, 2013 Name: Person on right SID: Person on left There will be one, double sided, handwritten, 8.5in x 11in page of notes allowed during the exam. The exam is closed
More informationDo not copy, post, or distribute
14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible
More informationUniversity of Jordan Fall 2009/2010 Department of Mathematics
handouts Part 1 (Chapter 1 - Chapter 5) University of Jordan Fall 009/010 Department of Mathematics Chapter 1 Introduction to Introduction; Some Basic Concepts Statistics is a science related to making
More informationCHAPTER 8 INTRODUCTION TO STATISTICAL ANALYSIS
CHAPTER 8 INTRODUCTION TO STATISTICAL ANALYSIS LEARNING OBJECTIVES: After studying this chapter, a student should understand: notation used in statistics; how to represent variables in a mathematical form
More informationTesting Independence
Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1
More informationModule 03 Lecture 14 Inferential Statistics ANOVA and TOI
Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationTHE PEARSON CORRELATION COEFFICIENT
CORRELATION Two variables are said to have a relation if knowing the value of one variable gives you information about the likely value of the second variable this is known as a bivariate relation There
More informationReview for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling
Review for Final For a detailed review of Chapters 1 7, please see the review sheets for exam 1 and. The following only briefly covers these sections. The final exam could contain problems that are included
More informationNotes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing
Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing 1. Purpose of statistical inference Statistical inference provides a means of generalizing
More information1/11/2011. Chapter 4: Variability. Overview
Chapter 4: Variability Overview In statistics, our goal is to measure the amount of variability for a particular set of scores, a distribution. In simple terms, if the scores in a distribution are all
More informationRetrieve and Open the Data
Retrieve and Open the Data 1. To download the data, click on the link on the class website for the SPSS syntax file for lab 1. 2. Open the file that you downloaded. 3. In the SPSS Syntax Editor, click
More informationTopic 21 Goodness of Fit
Topic 21 Goodness of Fit Contingency Tables 1 / 11 Introduction Two-way Table Smoking Habits The Hypothesis The Test Statistic Degrees of Freedom Outline 2 / 11 Introduction Contingency tables, also known
More informationINTRODUCTION TO ANALYSIS OF VARIANCE
CHAPTER 22 INTRODUCTION TO ANALYSIS OF VARIANCE Chapter 18 on inferences about population means illustrated two hypothesis testing situations: for one population mean and for the difference between two
More informationAssumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals
Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals 4 December 2018 1 The Simple Linear Regression Model with Normal Residuals In previous class sessions,
More informationChapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance
Chapter 8 Student Lecture Notes 8-1 Department of Economics Business Statistics Chapter 1 Chi-square test of independence & Analysis of Variance ECON 509 Dr. Mohammad Zainal Chapter Goals After completing
More informationInference for Categorical Data. Chi-Square Tests for Goodness of Fit and Independence
Chi-Square Tests for Goodness of Fit and Independence Chi-Square Tests In this course, we use chi-square tests in two different ways The chi-square test for goodness-of-fit is used to determine whether
More informationCHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups 10. Comparing Two Means The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Comparing Two Means Learning
More informationCHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups 10.2 Comparing Two Means The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Comparing Two Means Learning
More informationLecture 1: Descriptive Statistics
Lecture 1: Descriptive Statistics MSU-STT-351-Sum 15 (P. Vellaisamy: MSU-STT-351-Sum 15) Probability & Statistics for Engineers 1 / 56 Contents 1 Introduction 2 Branches of Statistics Descriptive Statistics
More information4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49
4 HYPOTHESIS TESTING 49 4 Hypothesis testing In sections 2 and 3 we considered the problem of estimating a single parameter of interest, θ. In this section we consider the related problem of testing whether
More informationStatistics Handbook. All statistical tables were computed by the author.
Statistics Handbook Contents Page Wilcoxon rank-sum test (Mann-Whitney equivalent) Wilcoxon matched-pairs test 3 Normal Distribution 4 Z-test Related samples t-test 5 Unrelated samples t-test 6 Variance
More informationIntroduction to Basic Statistics Version 2
Introduction to Basic Statistics Version 2 Pat Hammett, Ph.D. University of Michigan 2014 Instructor Comments: This document contains a brief overview of basic statistics and core terminology/concepts
More informationChapter 2 Solutions Page 15 of 28
Chapter Solutions Page 15 of 8.50 a. The median is 55. The mean is about 105. b. The median is a more representative average" than the median here. Notice in the stem-and-leaf plot on p.3 of the text that
More informationSTP 226 EXAMPLE EXAM #3 INSTRUCTOR:
STP 226 EXAMPLE EXAM #3 INSTRUCTOR: Honor Statement: I have neither given nor received information regarding this exam, and I will not do so until all exams have been graded and returned. Signed Date PRINTED
More informationPractice problems from chapters 2 and 3
Practice problems from chapters and 3 Question-1. For each of the following variables, indicate whether it is quantitative or qualitative and specify which of the four levels of measurement (nominal, ordinal,
More informationDepartment of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.
Department of Economics Business Statistics Chapter 1 Chi-square test of independence & Analysis of Variance ECON 509 Dr. Mohammad Zainal Chapter Goals After completing this chapter, you should be able
More informationCHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the
CHAPTER 4 VARIABILITY ANALYSES Chapter 3 introduced the mode, median, and mean as tools for summarizing the information provided in an distribution of data. Measures of central tendency are often useful
More informationDescriptive Statistics-I. Dr Mahmoud Alhussami
Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.
More informationHYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC
1 HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC 7 steps of Hypothesis Testing 1. State the hypotheses 2. Identify level of significant 3. Identify the critical values 4. Calculate test statistics 5. Compare
More informationComparing Measures of Central Tendency *
OpenStax-CNX module: m11011 1 Comparing Measures of Central Tendency * David Lane This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 1.0 1 Comparing Measures
More informationMarketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)
Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12) Remember: Z.05 = 1.645, Z.01 = 2.33 We will only cover one-sided hypothesis testing (cases 12.3, 12.4.2, 12.5.2,
More informationChapter 11. Hypothesis Testing (II)
Chapter 11. Hypothesis Testing (II) 11.1 Likelihood Ratio Tests one of the most popular ways of constructing tests when both null and alternative hypotheses are composite (i.e. not a single point). Let
More informationThe science of learning from data.
STATISTICS (PART 1) The science of learning from data. Numerical facts Collection of methods for planning experiments, obtaining data and organizing, analyzing, interpreting and drawing the conclusions
More informationStatistical Distribution Assumptions of General Linear Models
Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions
More informationOverview. INFOWO Statistics lecture S1: Descriptive statistics. Detailed Overview of the Statistics track. Definition
Overview INFOWO Statistics lecture S1: Descriptive statistics Peter de Waal Introduction to statistics Descriptive statistics Department of Information and Computing Sciences Faculty of Science, Universiteit
More informationGlossary for the Triola Statistics Series
Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling
More informationIENG581 Design and Analysis of Experiments INTRODUCTION
Experimental Design IENG581 Design and Analysis of Experiments INTRODUCTION Experiments are performed by investigators in virtually all fields of inquiry, usually to discover something about a particular
More informationBasic Business Statistics, 10/e
Chapter 1 1-1 Basic Business Statistics 11 th Edition Chapter 1 Chi-Square Tests and Nonparametric Tests Basic Business Statistics, 11e 009 Prentice-Hall, Inc. Chap 1-1 Learning Objectives In this chapter,
More informationData Analysis and Statistical Methods Statistics 651
Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching/ Suhasini Subba Rao Review In the previous lecture we looked at the statistics of M&Ms. This example illustrates
More informationChapter 1 Descriptive Statistics
MICHIGAN STATE UNIVERSITY STT 351 SECTION 2 FALL 2008 LECTURE NOTES Chapter 1 Descriptive Statistics Nao Mimoto Contents 1 Overview 2 2 Pictorial Methods in Descriptive Statistics 3 2.1 Different Kinds
More informationStatistics Introductory Correlation
Statistics Introductory Correlation Session 10 oscardavid.barrerarodriguez@sciencespo.fr April 9, 2018 Outline 1 Statistics are not used only to describe central tendency and variability for a single variable.
More informationØ Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.
Statistical Tools in Evaluation HPS 41 Fall 213 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific
More informationBusiness Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee
Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)
More informationFREQUENCY DISTRIBUTIONS AND PERCENTILES
FREQUENCY DISTRIBUTIONS AND PERCENTILES New Statistical Notation Frequency (f): the number of times a score occurs N: sample size Simple Frequency Distributions Raw Scores The scores that we have directly
More informationReview of One-way Tables and SAS
Stat 504, Lecture 7 1 Review of One-way Tables and SAS In-class exercises: Ex1, Ex2, and Ex3 from http://v8doc.sas.com/sashtml/proc/z0146708.htm To calculate p-value for a X 2 or G 2 in SAS: http://v8doc.sas.com/sashtml/lgref/z0245929.htmz0845409
More informationQuestion. Hypothesis testing. Example. Answer: hypothesis. Test: true or not? Question. Average is not the mean! μ average. Random deviation or not?
Hypothesis testing Question Very frequently: what is the possible value of μ? Sample: we know only the average! μ average. Random deviation or not? Standard error: the measure of the random deviation.
More informationMeasures of Central Tendency and their dispersion and applications. Acknowledgement: Dr Muslima Ejaz
Measures of Central Tendency and their dispersion and applications Acknowledgement: Dr Muslima Ejaz LEARNING OBJECTIVES: Compute and distinguish between the uses of measures of central tendency: mean,
More informationDETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics
DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and
More informationWELCOME! Lecture 13 Thommy Perlinger
Quantitative Methods II WELCOME! Lecture 13 Thommy Perlinger Parametrical tests (tests for the mean) Nature and number of variables One-way vs. two-way ANOVA One-way ANOVA Y X 1 1 One dependent variable
More informationAnalytical Graphing. lets start with the best graph ever made
Analytical Graphing lets start with the best graph ever made Probably the best statistical graphic ever drawn, this map by Charles Joseph Minard portrays the losses suffered by Napoleon's army in the Russian
More informationChapter 2 The Mean, Variance, Standard Deviation, and Z Scores. Instructor s Summary of Chapter
Chapter 2 The Mean, Variance, Standard Deviation, and Z Scores Instructor s Summary of Chapter Mean. The mean is the ordinary average the sum of the scores divided by the number of scores. Expressed in
More informationCIVL 7012/8012. Collection and Analysis of Information
CIVL 7012/8012 Collection and Analysis of Information Uncertainty in Engineering Statistics deals with the collection and analysis of data to solve real-world problems. Uncertainty is inherent in all real
More informationCHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC
CHI SQUARE ANALYSIS I N T R O D U C T I O N T O N O N - P A R A M E T R I C A N A L Y S E S HYPOTHESIS TESTS SO FAR We ve discussed One-sample t-test Dependent Sample t-tests Independent Samples t-tests
More information