PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 5

Similar documents
additionalmathematicsstatisticsadditi onalmathematicsstatisticsadditionalm athematicsstatisticsadditionalmathem aticsstatisticsadditionalmathematicsst

Grade 3. Grade 3 K 8 Standards 23

Mathematics Grade 3. grade 3 21

CHAPTER 8 INTRODUCTION TO STATISTICAL ANALYSIS

Biostatistics Presentation of data DR. AMEER KADHIM HUSSEIN M.B.CH.B.FICMS (COM.)

PSYCHOLOGICAL STATISTICS

Author : Dr. Pushpinder Kaur. Educational Statistics: Mean Median and Mode

Descriptive Statistics

MEASURES OF CENTRAL TENDENCY

Class 11 Maths Chapter 15. Statistics

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

Granite School District Parent Guides Utah Core State Standards for Mathematics Grades K-6

CHAPTER 14 STATISTICS Introduction

The science of learning from data.

Pre-Algebra (6/7) Pacing Guide

MIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability

Assessment in Mathematics Year 6 and KS3. Nigel Bufton MATHSEDUCATIONAL LTD and the London Borough of Camden

KCP e-learning. test user - ability basic maths revision. During your training, we will need to cover some ground using statistics.

Correlation to the Common Core State Standards

DCSD Common Core State Standards Math Pacing Guide 3rd Grade. Trimester 1

Lecture 1 : Basic Statistical Measures

New Paltz Central School District Mathematics Third Grade

Math 6 Common Core. Mathematics Prince George s County Public Schools

The Not-Formula Book for C2 Everything you need to know for Core 2 that won t be in the formula book Examination Board: AQA

Agile Mind Mathematics 6 Scope and Sequence, Common Core State Standards for Mathematics

Chapter 2: Tools for Exploring Univariate Data

(Refer Slide Time 02:20)

Agile Mind Mathematics 6 Scope and Sequence, Common Core State Standards for Mathematics

Math 2 Variable Manipulation Part 6 System of Equations

Chapter 9: Roots and Irrational Numbers

Sequence of Grade 6 Modules Aligned with the Standards

These standards are grouped by concepts and are not necessarily arranged in any specific order for presentation.

Chapter 1. ANALYZE AND SOLVE LINEAR EQUATIONS (3 weeks)

Mathematics Grade 6. grade 6 39

Why It s Important. What You ll Learn

Grade 3 Unit Standards ASSESSMENT #1

Grade 3 Yearlong Mathematics Map

Supplemental Resources: Engage New York: Lesson 1-21, pages 1.A.3-1.F.45 3 rd Grade Math Folder Performance Task: Math By All Means (Multiplication

Sequence Units for the CCRS in Mathematics Grade 3

Linear Algebra. The analysis of many models in the social sciences reduces to the study of systems of equations.

UNIT 4 RANK CORRELATION (Rho AND KENDALL RANK CORRELATION

Oasis Academy Arena Curriculum Term Plan: Mathematics

GRADE 6 OVERVIEW. Ratios and Proportional Relationships [RP] Understand ratio concepts and use ratio reasoning to solve problems.

Chapter 5: Exploring Data: Distributions Lesson Plan

Algebra. Mathematics Help Sheet. The University of Sydney Business School

Statistics. Industry Business Education Physics Chemistry Economics Biology Agriculture Psychology Astronomy, etc. GFP - Sohar University

Section-A. Short Questions

Module 4 MULTI- RESOLUTION ANALYSIS. Version 2 ECE IIT, Kharagpur

Third Grade One-Page Math Curriculum Map for

Scope and Sequence: National Curriculum Mathematics from Haese Mathematics (7 10A)

Algebra I. Mathematics Curriculum Framework. Revised 2004 Amended 2006

6th Grade Pacing Guide st Nine Weeks

Madison County Schools Suggested 3 rd Grade Math Pacing Guide,

UNIT 2 MEAN, MEDIAN AND MODE

1.1.1 Algebraic Operations

Math 2 Variable Manipulation Part 7 Absolute Value & Inequalities

1 Implication and induction

QUADRATIC EQUATIONS M.K. HOME TUITION. Mathematics Revision Guides Level: GCSE Higher Tier

Foundations 5 Curriculum Guide

CHAPTER 1: Functions

Functions and graphs - Grade 10 *

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Unit 1: Ratios and Proportional Relationships

Math 6 Course Guide

North Carolina 6 th GRADE MATH Pacing Guide

Multiple Choice. Chapter 2 Test Bank

California CCSS Mathematics Grades 1-3

Grade 6: Mathematics Curriculum (2010 Common Core) Warren Hills Cluster (K 8)

1. For which of these would you use a histogram to show the data? (a) The number of letters for different areas in a postman s bag.

Overview of Dispersion. Standard. Deviation

Virginia Unit-Specific Learning Pathways. Grades 6-Algebra I: Standards of Learning

YEAR 10 PROGRAM TERM 1 TERM 2 TERM 3 TERM 4

Physics 2A Chapter 1: Introduction and Mathematical Concepts

Types of Symmetry. We will be concerned with two types of symmetry.

56 CHAPTER 3. POLYNOMIAL FUNCTIONS

4 th Grade Hinojosa Math Vocabulary Words

Math Scope and Sequence

t dt Estimate the value of the integral with the trapezoidal rule. Use n = 4.

Standards for Mathematical Practice. Ratio and Proportional Relationships

Algebraic Expressions

For those of you who are taking Calculus AB concurrently with AP Physics, I have developed a

TOPIC: Descriptive Statistics Single Variable

Algebra Readiness. Curriculum (445 topics additional topics)

Grade 6 - SBA Claim 1 Example Stems

Measures of Central Tendency

MATHEMATICS Math I. Number and Quantity The Real Number System

Ganado Unified School District (Math/6 Grade)

Middle School Math Solution: Course 3

MIA Textbook INTERMEDIATE 1 CHECKLIST

Clinton Community School District K-8 Mathematics Scope and Sequence

Learning Expectations for Sample Middle School. Math - Grade 6

FORCE TABLE INTRODUCTION

The Matrix Vector Product and the Matrix Product

1 Measurement Uncertainties

Ohio s Learning Standards-Extended. Mathematics. Ratio and Proportional Relationships Complexity a Complexity b Complexity c

Destination Math. Scope & Sequence. Grades K 12 solutions

Arithmetic with Whole Numbers and Money Variables and Evaluation (page 6)

Mapping Common Core State Standard Clusters and. Ohio Grade Level Indicator. Grade 6 Mathematics

Chapter 2. Motion in One Dimension. AIT AP Physics C

Math 1 Variable Manipulation Part 5 Absolute Value & Inequalities

Transcription:

6. Mathematical background PSYCHOLOGICAL RESEARCH (PYC 34-C) Lecture 5 Numbers and quantification offer us a very special language which enables us to express ourselves in exact terms. This language is called Mathematics. We will now learn the basic rules of Mathematics in order to communicate effectively with figures. A huge part of psychological research deals with statistical analysis so that one needs an adequate mathematical background to understand statistical computations. 6.1 Pocket calculator In both modules, you will need a scientific calculator, that is, one which has statistical functions and, more preferably, one having the regression mode. The most cost-effective calculator for this course is the CASIO FX-82 TL (it costs about Rs 3). This will save you a tremendous amount of time in the examinations once statistical data entered, statistics like the number of observations, mean, standard deviation, correlation and regression coefficients can be readily obtained by just pressing buttons. Note The study guide advises students to buy a programmable calculator (which, in my opinion, is not worth it for these modules). 6.2 Summation notation The summation notation is used to summarise a series, that is, the sum of the terms of a sequence. It is denoted by Greek capital letter sigma,, as opposed to small letter sigma, σ, which, in Statistics, stands for standard deviation. Sigma is most of the time seen in the following form: b r = a f ( r) where r is known as the index, a and b are the lower and upper limits of summation respectively and f (r) is known as the general term. r, just like a counter, starts at a and increases by steps of 1 until it reaches b. Each term of the series is obtained by substituting successive values of r in the general term. The following example illustrates the mechanism. 1

6.2.1 Example 6 = k 2 (2k + 1) = [ 2(2) + 1] + [ 2(3) + 1] +... + [ 2(6) + 1] = 5 + 7 + 9 + 11+ 13 = 45. Here, the index (counter) is k. It can be observed that k takes on an initial value of 2 (the lower limit) and increases by steps of 1 until it reaches the upper limit 6. Every value that k assumes is substituted in the general term (2k + 1) in order to generate a term of the series. Obviously, the terms are added up since Sigma stands for summation. In Statistics, however, we do not actually evaluate such expressions numerically but rather use the summation notation strictly for summarisation purposes. This is because the upper limit is generally non-numerical, that is, a n variable. We deal mostly with expressions of the form. If expanded, this = summation cannot be evaluated since it only gives the expression x 1 + x2 + x3 +... + x n 1 + x n. Such expressions are found in the formulae for arithmetic mean and standard deviation. In this module, students are simply required to recognise the summation notation and understand its meaning so that they can at least use relevant statistical functions on calculators. i x i 1 7. Presentation of data Once information has been collected, it has to be classified and organised in such a way that it becomes easily readable, that is, converted to data. Before calculation of descriptive statistics, it is sometimes a good idea to present it on charts, diagrams or graphs. Most people find diagrams more helpful than figures in the sense that these present data more meaningfully. In this module, we will only consider the presentation of data in the form of histograms and frequency polygons (read the properties of histograms and frequency polygons in Sections 7.3 and 7.4). 7.1 Ungrouped data This type of information occurs as individual observations, usually as a table or array of disorderly values. These observations are to be firstly arranged in some order (ascending or descending if they are numerical) or simply grouped together in the form of a frequency table before proper presentation on diagrams is possible. 2

The following will be used as an example of ungrouped data throughout section 6.1 of the notes. 7.1.1 Example The following data represent the age of students attending full-time B Sc. courses at De Chazal Du Mée Business School: 22 19 21 22 22 22 19 24 21 22 22 22 21 23 21 21 23 24 22 23 21 21 2 21 21 21 23 22 21 21 22 23 22 22 23 21 22 19 22 22 21 22 21 22 2 21 23 21 22 22 23 21 21 23 22 22 22 23 2 23 21 22 21 22 22 21 21 22 23 21 2 21 22 23 21 21 22 22 23 19 22 21 21 2 22 23 22 22 21 23 22 21 23 21 22 23 2 21 22 22 22 19 21 22 22 22 19 24 21 22 22 22 21 23 21 21 23 24 22 23 21 21 2 21 21 21 23 22 21 21 22 23 22 22 23 21 22 19 22 22 21 22 21 22 2 21 23 21 22 22 23 21 21 23 22 22 22 23 2 19 21 22 21 22 22 21 21 22 23 21 2 21 22 23 21 21 22 22 23 23 22 21 21 2 22 23 22 22 21 23 22 21 23 21 22 23 2 21 22 22 22 19 21 22 22 22 19 24 21 22 22 22 21 23 21 21 23 24 22 23 21 21 2 21 21 21 23 22 21 21 22 23 22 22 23 21 22 19 22 22 21 22 21 22 2 21 23 21 22 22 23 21 21 23 22 22 22 23 21 22 21 22 21 22 22 21 21 22 23 21 2 21 22 23 21 21 22 22 23 21 22 21 21 2 22 23 22 22 21 23 22 21 23 21 22 23 2 21 22 22 22 19 21 22 22 22 19 24 21 22 22 22 21 23 21 21 23 24 22 23 21 21 2 21 21 21 23 22 21 21 22 23 22 22 23 21 22 19 22 22 21 22 21 22 2 21 23 21 22 22 23 21 21 23 22 22 22 23 2 23 21 22 21 22 22 21 21 22 23 21 2 21 22 23 21 21 22 22 23 22 22 21 21 2 22 23 22 22 21 23 22 21 23 21 22 23 2 21 22 Table 7.1.1.1 (The above information has been collected from the list of B Sc. Students from DCDMBS administration so that the ages are in random order.) 3

Once the observations are arranged in ascending order, for example, they can be more easily manipulable in terms of better arrangement and, hence, can be treated more efficiently. Given the relatively large amount of values, 399 to be more precise, a discrete frequency table (see Table 7.1.1.2 below) is a much more appropriate way of classifying them without loss of information. The identity of each value is preserved so that exact calculation of statistics still remains possible (to be dealt with further). Age Frequency 19 14 2 23 21 134 22 149 23 71 24 8 Total 399 Table 7.1.1.2 7.1.2 Presentation of ungrouped data on a histogram Histogram of ungrouped data 16 Number of students (frequency) 14 12 1 8 6 4 2 <=18 (18, 19] (19, 2] (2, 21] (21, 22] (22, 23] (23, 24] (24, 25] >25 Age of students Fig. 7.1.2 4

7.1.3 Presentation of ungrouped data on a frequency polygon Frequency polygon for ungrouped data 16 Number of students (frequency) 14 12 1 8 6 4 2 18 19 2 21 22 23 24 25 Age of students Fig. 7.1.3 7.2 Grouped data When the range of values (not observations) is too wide, a discrete frequency table starts to become quite lengthy and cumbersome. Observations are then grouped into cells or classes in order to compress the set of data for more suitable tabulation. In this case, Example 6.1.1 would not be a good illustration, given the little variation in ages of students (from 19 to 24). The main drawback in grouping of data is that the identity (value) of each observation is lost so that important descriptive statistics like the mean and standard deviation can only be estimated and not exactly calculated. For example, if the age group 21 25 has frequency 5, nothing can be said about the values of these 5 observations. Besides, a lot of new quantities have to be calculated in order to satisfy statistical calculations and analyses as will be explained in the following sections. 7.2.1 Limits and real limits (or boundaries) A class is bounded by a lower and an upper limit in the previous paragraph, the lower and upper limits of the age group 21 25 are 21 and 25 5

respectively. A real limit is obtained by making a continuity correction to a limit (explained below). In a frequency distribution, we differentiate between limits and real limits by the fact that the upper limit of a cell can never be equal to the lower limit of the next cell. Real limits are fictitious values if the values recorded are discrete. However, they are useful not only for the purpose of calculations but also for presentation of data on histograms as well as several other types of charts and diagrams. For instance, if we have a frequency distribution of ages in which we have the two neighbouring cells 21 25 and 26 3, then drawing a histogram for this distribution will require that the limits 25 and 26 be equal, the reason being that there is no gap between any two successive rectangles of a histogram! We therefore make a continuity correction of ±.5, the equivalent of half a gap. Note The gap between any pair of successive cells in a frequency distribution is equal to the degree of accuracy to which the original observations were recorded. In the above example, it is easy to deduce that age was recorded to the nearest unit since the gap between the cells 21 25 and 26 3 is 1. The real limits of these 2 will now be 2.5 25.5 and 25.5 3.5. Note that the following relationships hold: Lower real limit = Lower limit continuity correction Upper real limit = Upper limit + continuity correction 7.2.2 Mid-class values (MCV) The mid-class value, MCV, of a cell is defined as its midpoint, that is, the average of its limits or real limits. Thus, the MCV of the cell 21 25 is 23. The MCV of a cell is the representative of that cell in the sense that, since the values of all the observations in the cell are unknown individually, it is assumed that they are all equal to the MCV. This assumption is not fortuitous and neither is it unjustified. It has the logical implication that if observations are unknown, the best way of estimating statistics more accurately would be to assume that, at least, they are uniformly distributed within the cell (which could be untrue, of course!). Mathematically, the sum of the observations would be equal to the number of observations multiplied by the MCV (think about it!). The importance of the midclass value can thus never be underestimated, especially for the calculation of the crucial statistics like the mean and standard deviation. 6

7.2.3 Class interval or cell width The cell width is simply the length of the cell, that is, the difference between its lower and upper real limits. Note Do not make the mistake of subtracting the lower limit from the upper limit since this will not give the exact cell width. This can be easily verified by taking the cell 21 25. Its cell width is 5 (21, 22, 23, 24 and 25), which is obtained by subtracting 2.5 from 25.5. We therefore use the following formula: Cell width = Upper real limit Lower real limit 7.2.4 Example Consider the following set of data, which represents the ages of workers of a private company. The real limits and mid-class values have already been computed. Age group Real limits Mid-class value Frequency 21 25 2.5 25.5 23 5 26 3 25.5 3.5 28 12 31 35 3.5 35.5 33 23 36 4 35.5 4.5 38 39 41 45 4.5 45.5 43 32 46 5 45.5 5.5 48 21 51 55 5.5 55.5 53 9 56 6 55.5 6.5 58 2 Total 143 Table 7.2.4.1 The data is presented on the histogram in Fig. 7.2.4.2 and the frequency polygon in Fig. 7.2.4.3. 7

7.2.4.1 Presentation of ungrouped data (uniform class interval) on a histogram 45 4 Histogram for grouped data Number of workers (frequency) 35 3 25 2 15 1 5 [2.5, 25.5) [25.5, 3.5) [3.5, 35.5) [35.5, 4.5) [4.5, 45.5) [45.5, 5.5) [5.5, 55.5) [55.5, 6.5) Age group of workers Fig. 7.2.4.2 7.2.4.2 Presentation of ungrouped data on a frequency polygon Frequency polygon for grouped data Number of students (frequency) 45 4 35 3 25 2 15 1 5 2.5 25.5 25.5 3.5 3.5 35.5 35.5 4.5 4.5 45.5 45.5 5.5 5.5 55.5 55.5 6.5 Age of students Fig. 7.2.4.3 8

7.3 Histograms Out of several methods of presenting a frequency distribution graphically, the histogram is the most popular and widely used in practice. A histogram is a set of vertical bars whose areas are proportional to the frequencies of the classes that they represent. While constructing a histogram, the variable is always taken on the x-axis while the frequencies are on the y-axis. Each class is then represented by a distance on the scale that is proportional to its class interval (see Section 7.2.3). The distance for each rectangle on the x-axis shall remain the same in the case that the class intervals are uniform throughout the distribution. If the classes have different class intervals, they will obviously vary accordingly on the x-axis. The y- axis represents the frequencies of each class which constitute the height of the rectangle. When class intervals are unequal, a correction must be made. This consists of finding the frequency density for each class, which is the ratio of the frequency to the class interval. The frequency densities now become the actual heights of the rectangles since the areas of the rectangles should be proportional to the frequencies. 7.3.1 Example (unequal class intervals) The temperatures (in degrees Fahrenheit) were simultaneously recorded in various cities in the world at a specific moment. Table 7.3.1.1 below gives the thermometer readings. Temperature Class intervals Frequency Frequency density - 5 5 3.6 5 1 5 6 1.2 1 2 1 1 1. 2 3 1 15 1.5 3 4 1 1 1. 4 5 1 5.5 5-7 2 5.25 Total 54 Table 7.3.1.1 Note 2 3 means from 2 to 3, including 2 but excluding 3 9

7.3.1.1 Presentation of grouped data (unequal class intervals) on a histogram Histogram (unequal class intervals) using frequency density 1.6 1.4 Frequency density 1.2 1.8.6.4.2 1 2 3 4 5 6 7 8 Temparature (degrees Fahrenheit) Fig. 7.3.1.2 The histogram should be clearly distinguished from the bar chart. The most striking physical difference between these two diagrams is that, unlike the bar chart, there are no gaps between successive rectangles of a histogram. A bar chart is one-dimensional since only the length, and not the width, matters whereas a histogram is two-dimensional since both length and width are important. A histogram is mainly used to display data for continuous variables but can also be adjusted so as to present discrete data by making an appropriate continuity correction (see Section 7.2.1). Moreover, it can be quite misleading if the distribution has unequal class intervals. 7.4 Frequency polygons A frequency polygon is a graph of frequency distribution. There is a very effective in which a frequency polygon may be constructed: Draw a histogram of the given data and then join, by means of straight lines, the midpoints of the upper horizontal side of each rectangle with the adjacent ones. It is an accepted practice to close the polygon at both ends of the distribution by extending them to the base line. When this is done, two 1

hypothetical classes with zero frequencies must be included at each end. This extension is made with the objective of making the area under the polygon equal to the area under the corresponding histogram. A frequency polygon sketches an outline of the data pattern more clearly. In fact, it is the refinement of a histogram, as it does not assume that the frequencies of observations within a class are equal. The polygon becomes increasingly smooth and curve-like as we increase the number of classes in a distribution. Frequency polygon and histogram 1.6 1.4 Frequency density 1.2 1.8.6.4.2-1 1 2 3 4 5 6 7 8 9 Fig. 7.4.1.1 11