CORRELATION AND REGRESSION

Similar documents
CORRELATION AND REGRESSION

Chapter 9: Statistical Inference and the Relationship between Two Variables

SIMPLE LINEAR REGRESSION

/ n ) are compared. The logic is: if the two

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Statistics for Economics & Business

Chapter 11: Simple Linear Regression and Correlation

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Comparison of Regression Lines

Statistics for Business and Economics

Economics 130. Lecture 4 Simple Linear Regression Continued

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Chapter 13: Multiple Regression

Negative Binomial Regression

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Basic Business Statistics, 10/e

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Statistics II Final Exam 26/6/18

LECTURE 9 CANONICAL CORRELATION ANALYSIS

Chapter 8 Indicator Variables

The topics in this section concern with the second course objective. Correlation is a linear relation between two random variables.

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Lecture 3 Stat102, Spring 2007

Chapter 3 Describing Data Using Numerical Measures

x i1 =1 for all i (the constant ).

Global Sensitivity. Tuesday 20 th February, 2018

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Statistics MINITAB - Lab 2

e i is a random error

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

Introduction to Regression

MEASURES OF CENTRAL TENDENCY AND DISPERSION

Correlation and Regression

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

A Robust Method for Calculating the Correlation Coefficient

= z 20 z n. (k 20) + 4 z k = 4

Chapter 14 Simple Linear Regression

STAT 3008 Applied Regression Analysis

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

This column is a continuation of our previous column

STATISTICS QUESTIONS. Step by Step Solutions.

First Year Examination Department of Statistics, University of Florida

Kernel Methods and SVMs Extension

Lecture 6: Introduction to Linear Regression

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Chapter 15 - Multiple Regression

STAT 511 FINAL EXAM NAME Spring 2001

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,

x = , so that calculated

Polynomial Regression Models

January Examinations 2015

Difference Equations

Linear Approximation with Regularization and Moving Least Squares

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Lecture 3: Probability Distributions

Linear Correlation. Many research issues are pursued with nonexperimental studies that seek to establish relationships among 2 or more variables

Section 8.3 Polar Form of Complex Numbers

Module 9. Lecture 6. Duality in Assignment Problems

a. (All your answers should be in the letter!

Indeterminate pin-jointed frames (trusses)

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Laboratory 1c: Method of Least Squares

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Midterm Examination. Regression and Forecasting Models

Sampling Theory MODULE VII LECTURE - 23 VARYING PROBABILITY SAMPLING

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

III. Econometric Methodology Regression Analysis

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Credit Card Pricing and Impact of Adverse Selection

Basically, if you have a dummy dependent variable you will be estimating a probability.

Learning Objectives for Chapter 11

Uncertainty and auto-correlation in. Measurement

Expected Value and Variance

The Ordinary Least Squares (OLS) Estimator

Chapter 5 Multilevel Models

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

Foundations of Arithmetic

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Chapter 12 Analysis of Covariance

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

COMPLEX NUMBERS AND QUADRATIC EQUATIONS

THE SUMMATION NOTATION Ʃ

Laboratory 3: Method of Least Squares

Composite Hypotheses testing

LOGIT ANALYSIS. A.K. VASISHT Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi

Transcription:

CHAPTER 18 After readng ths chapter, students wll be able to understand: LEARNING OBJECTIVES The meanng of bvarate data and technques of preparaton of bvarate dstrbuton; The concept of correlaton between two varables and quanttatve measurement of correlaton ncludng the nterpretaton of postve, negatve and zero correlaton; Concept of regresson and ts applcaton n estmaton of a varable from known set of data. UNIT OVERVIEW Bvarate Data Correlaton Analyss Margnal Dstrbuton Bvarate Frequency Dstrbuton Condtonal Dstrbuton Types of Correlaton Measures of Correlaton Postve Correlaton Negatve Correlaton Scatter Dagram Karl Person Product Moment correlaton Coeffcent Spearmar s Correlaton Coeffcent Coeffcent of Concurrent Devatons Regresson Analyss Estmaton of Regresson Analyss Meod of Least Squares Regresson Lnes Regresson equaton y on x Regresson equaton x on y JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 673 / 808

18. STATISTICS 18.1 INTRODUCTION In the prevous chapter, we dscussed many a statstcal measure relatng to Unvarate dstrbuton.e. dstrbuton of one varable lke heght, weght, mark, proft, wage and so on. However, there are stuatons that demand study of more than one varable smultaneously. A busnessman may be keen to know what amount of nvestment would yeld a desred level of proft or a student may want to know whether performng better n the selecton test would enhance hs or her chance of dong well n the fnal examnaton. Wth a vew to answerng ths seres of questons, we need to study more than one varable at the same tme. Correlaton Analyss and Regresson Analyss are the two analyses that are made from a multvarate dstrbuton.e. a dstrbuton of more than one varable. In partcular when there are two varables, say x and y, we study bvarate dstrbuton. We restrct our dscusson to bvarate dstrbuton only. Correlaton analyss, t may be noted, helps us to fnd an assocaton or the lack of t between the two varables x and y. Thus f x and y stand for proft and nvestment of a frm or the marks n Statstcs and Mathematcs for a group of students, then we may be nterested to know whether x and y are assocated or ndependent of each other. The extent or amount of correlaton between x and y s provded by dfferent measures of Correlaton namely Product Moment Correlaton Coeffcent or Rank Correlaton Coeffcent or Coeffcent of Concurrent Devatons. In Correlaton analyss, we must be careful about a cause and effect relaton between the varables under consderaton because there may be stuatons where x and y are related due to the nfluence of a thrd varable although no causal relatonshp exsts between the two varables. Regresson analyss, on the other hand, s concerned wth predctng the value of the dependent varable correspondng to a known value of the ndependent varable on the assumpton of a mathematcal relatonshp between the two varables and also an average relatonshp between them. 18. BIVARIATE DATA When data are collected on two varables smultaneously, they are known as bvarate data and the correspondng frequency dstrbuton, derved from t, s known as Bvarate Frequency Dstrbuton. If x and y denote marks n Maths and Stats for a group of 30 students, then the correspondng bvarate data would be (x, y ) for 1,,. 30 where (x 1, y 1 ) denotes the marks n Mathematcs and Statstcs for the student wth seral number or Roll Number 1, (x, y ), that for the student wth Roll Number and so on and lastly (x 30, y 30 ) denotes the par of marks for the student bearng Roll Number 30. As n the case of a Unvarate Dstrbuton, we need to construct the frequency dstrbuton for bvarate data. Such a dstrbuton takes nto account the classfcaton n respect of both the varables smultaneously. Usually, we make horzontal classfcaton n respect of x and vertcal classfcaton n respect of the other varable y. Such a dstrbuton s known as Bvarate Frequency Dstrbuton or Jont Frequency Dstrbuton or Two way classfcaton of the two varables x and y. JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 674 / 808

18.3 ILLUSTRATIONS: Example 18.1: Prepare a Bvarate Frequency table for the followng data relatng to the marks n Statstcs (x) and Mathematcs (y): (15, 13), (1, 3), (, 6), (8, 3), (15, 10), (3, 9), (13, 19), (10, 11), (6, 4), (18, 14), (10, 19), (1, 8), (11, 14), (13, 16), (17, 15), (18, 18), (11, 7), (10, 14), (14, 16), (16, 15), (7, 11), (5, 1), (11, 15), (9, 4), (10, 15), (13, 1) (14, 17), (10, 11), (6, 9), (13, 17), (16, 15), (6, 4), (4, 8), (8, 11), (9, 1), (14, 11), (16, 15), (9, 10), (4, 6), (5, 7), (3, 11), (4, 16), (5, 8), (6, 9), (7, 1), (15, 6), (18, 11), (18, 19), (17, 16) (10, 14) Take mutually exclusve classfcaton for both the varables, the frst class nterval beng 0-4 for both. Soluton: From the gven data, we fnd that Range for x 19 1 18 Range for y 19 1 18 We take the class ntervals 0-4, 4-8, 8-1, 1-16, 16-0 for both the varables. Snce the frst par of marks s (15, 13) and 15 belongs to the fourth class nterval (1-16) for x and 13 belongs to the fourth class nterval for y, we put a stroke n the (4, 4)-th cell. We carry on gvng tally marks tll the lst s exhausted. JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 675 / 808

18.4 STATISTICS Table 18.1 Bvarate Frequency Dstrbuton of Marks n Statstcs and Mathematcs. X MARKS IN STATS MARKS IN MATHS Y 0-4 4-8 8-1 1-16 16-0 Total 0 4 I (1) I (1) II () 4 4 8 I (1) IIII (4) IIII (5) I (1) I (1) 1 8 1 I (1) II () IIII (4) IIII I (6) I (1) 14 1 16 I (1) III (3) II () IIII (5) 11 16 0 I (1) IIII (5) III (3) 9 Total 3 8 15 14 10 50 We note, from the above table, that some of the cell frequences (f j ) are zero. Startng from the above Bvarate Frequency Dstrbuton, we can obtan two types of unvarate dstrbutons whch are known as: (a) (b) Margnal dstrbuton. Condtonal dstrbuton. If we consder the dstrbuton of Statstcs marks along wth the margnal totals presented n the last column of Table 1-1, we get the margnal dstrbuton of marks n Statstcs. Smlarly, we can obtan one more margnal dstrbuton of Mathematcs marks. The followng table shows the margnal dstrbuton of marks of Statstcs. Table 18. Margnal Dstrbuton of Marks n Statstcs Marks No. of Students 0-4 4 4-8 1 8-1 14 1-16 11 16-0 9 Total 50 We can fnd the mean and standard devaton of marks n Statstcs from Table 18.. They would be known as margnal mean and margnal SD of Statstcs marks. Smlarly, we can obtan the margnal mean and margnal SD of Mathematcs marks. Any other statstcal measure n respect of x or y can be computed n a smlar manner. JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 676 / 808

18.5 If we want to study the dstrbuton of Statstcs Marks for a partcular group of students, say for those students who got marks between 8 to 1 n Mathematcs, we come across another unvarate dstrbuton known as condtonal dstrbuton. Table 18.3 Condtonal Dstrbuton of Marks n Statstcs for Students havng Mathematcs Marks between 8 to 1 Marks No. of Students 0-4 4-8 5 8-1 4 1-16 3 16-0 1 Total 15 We may obtan the mean and SD from the above table. They would be known as condtonal mean and condtonal SD of marks of Statstcs. The same result holds for marks n Mathematcs. In partcular, f there are m classfcatons for x and n classfcatons for y, then there would be altogether (m + n) condtonal dstrbuton. 18.3 CORRELATION ANALYSIS Whle studyng two varables at the same tme, f t s found that the change n one varable s recprocated by a correspondng change n the other varable ether drectly or nversely, then the two varables are known to be assocated or correlated. Otherwse, the two varables are known to be dssocated or uncorrelated or ndependent. There are two types of correlaton. () Postve correlaton () Negatve correlaton If two varables move n the same drecton.e. an ncrease (or decrease) on the part of one varable ntroduces an ncrease (or decrease) on the part of the other varable, then the two varables are known to be postvely correlated. As for example, heght and weght yeld and ranfall, proft and nvestment etc. are postvely correlated. On the other hand, f the two varables move n the opposte drectons.e. an ncrease (or a decrease) on the part of one varable results a decrease (or an ncrease) on the part of the other varable, then the two varables are known to have a negatve correlaton. The prce and demand of an tem, the profts of Insurance Company and the number of clams t has to meet etc. are examples of varables havng a negatve correlaton. The two varables are known to be uncorrelated f the movement on the part of one varable does not produce any movement of the other varable n a partcular drecton. As for example, Shoesze and ntellgence are uncorrelated. JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 677 / 808

18.6 STATISTICS 18.4 MEASURES OF CORRELATION We consder the followng measures of correlaton: (a) Scatter dagram (b) Karl Pearson s Product moment correlaton coeffcent (c) Spearman s rank correlaton co-effcent (d) Co-effcent of concurrent devatons (a) SCATTER DIAGRAM Ths s a smple dagrammatc method to establsh correlaton between a par of varables. Unlke product moment correlaton co-effcent, whch can measure correlaton only when the varables are havng a lnear relatonshp, scatter dagram can be appled for any type of correlaton lnear as well as non-lnear.e. curvlnear. Scatter dagram can dstngush between dfferent types of correlaton although t fals to measure the extent of relatonshp between the varables. Each data pont, whch n ths case a par of values (x, y ) s represented by a pont n the rectangular axes of cordnates. The totalty of all the plotted ponts forms the scatter dagram. The pattern of the plotted ponts reveals the nature of correlaton. In case of a postve correlaton, the plotted ponts le from lower left corner to upper rght corner, n case of a negatve correlaton the plotted ponts concentrate from upper left to lower rght and n case of zero correlaton, the plotted ponts would be equally dstrbuted wthout depctng any partcular pattern. The followng fgures show dfferent types of correlaton and the one to one correspondence between scatter dagram and product moment correlaton coeffcent. FIGURE 18.1 FIGURE 18. Showng Postve Correlaton Showng Perfect Correlaton (0 < r <1) (r 1) JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 678 / 808

18.7 FIGURE 18.3 FIGURE 18.4 Showng Negatve Showng Perfect Negatve Correlaton Correlaton ( 1 < r <0) (r 1) FIGURE 18.5 FIGURE 18.6 Showng No Showng Curvlnear Correlaton Correlaton (r 0) (r 0) (b) KARL PEARSON S PRODUCT MOMENT CORRELATION COEFFICIENT Ths s by for the best method for fndng correlaton between two varables provded the relatonshp between the two varables s lnear. Pearson s correlaton coeffcent may be defned as the rato of covarance between the two varables to the product of the standard devatons of the two varables. If the two varables are denoted by x and y and f the correspondng bvarate data are (x, y ) for 1,, 3,.., n, then the coeffcent of correlaton between x and y, due to Karl Pearson, n gven by : JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 679 / 808

18.8 STATISTICS r r xy Cov x, y S x S y...(18.1) where cov (x, y) x x (y y) xy x y...(18.) n n x x x S x x n n...(18.3) y y y S y y...(18.4) n n and A sngle formula for computng correlaton coeffcent s gven by r nx y x y nx x n y ( y )...(18.5) In case of a bvarate frequency dstrbuton, we have Cov(x,y) x y f,j N j x y... (18.6) S x f x o N x...(18.7) and fojy j S y N y j...(18.8) where x Md-value of the th class nterval of x. JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 680 / 808

18.9 y j f o f oj f j Md-value of the j th class nterval of y Margnal frequency of x Margnal frequency of y frequency of the (, j) th cell N f j f o f oj Total frequency... (18.9),j PROPERTIES OF CORRELATION COEFFICIENT () () The Coeffcent of Correlaton s a unt-free measure. j Ths means that f x denotes heght of a group of students expressed n cm and y denotes ther weght expressed n kg, then the correlaton coeffcent between heght and weght would be free from any unt. The coeffcent of correlaton remans nvarant under a change of orgn and/or scale of the varables under consderaton dependng on the sgn of scale factors. Ths property states that f the orgnal par of varables x and y s changed to a new par of varables u and v by effectng a change of orgn and scale for both x and y.e. x a y c u and v b d where a and c are the orgns of x and y and b and d are the respectve scales and then we have bd r xy r b d u v...(18.10) r xy and r uv beng the coeffcent of correlaton between x and y and u and v respectvely, (18.10) establshed, numercally, the two correlaton coeffcents reman equal and they would have opposte sgns only when b and d, the two scales, dffer n sgn. () The coeffcent of correlaton always les between 1 and 1, ncludng both the lmtng values.e. 1 r 1...(18.11) Example 18.: Compute the correlaton coeffcent between x and y from the followng data n 10, xy 0, x 00, y 6 x 40 and y 50 JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 681 / 808

18.10 STATISTICS Soluton: From the gven data, we have by applyng (18.5), r nxy x y nx x n y y 10 0 40 50 10 00 (40) 10 6 (50) 00 000 000 1600 60 500 00 0 10.9545 0.91 Thus there s a good amount of postve correlaton between the two varables x and y. Alternately As gven, x 40 x 4 n 10 y 50 y 5 n 10 Cov (x, y) xy x. y n 0 4.5 10 S x x (x) n 00 4 10 JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 68 / 808

18.11 S y y y n 6 5 10 6.0 5 1.0954 Thus applyng formula (18.1), we get r cov(x, y) Sx. S y 0.91 1.0954 As before, we draw the same concluson. Example 18.3: Fnd product moment correlaton coeffcent from the followng nformaton: x : 3 5 5 6 8 y : 9 8 8 6 5 3 Soluton: In order to fnd the covarance and the two standard devaton, we prepare the followng table: Table 18.3 Computaton of Correlaton Coeffcent x y x y x y (1) () (3) (1) x () (4) (1) (5) () 9 18 4 81 3 8 4 9 64 5 8 40 5 64 5 6 30 5 36 6 5 30 36 5 8 3 4 64 9 9 39 166 163 79 JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 683 / 808

18.1 STATISTICS We have 9 x 6 39 4.8333 y 6.50 6 cov (x, y) xy x y n 166/6 4.8333 6.50 3.7498 (x) x n 163 (4.8333) 6 7.1667 3.3608 1.95 S y y (y) n 79 (6.50) 6 46.50 4.5.0616 Thus the correlaton coeffcent between x and y n gven by r cov (x, y) S s x y 3.7498 1.9509.0616 0.93 We fnd a hgh degree of negatve correlaton between x and y. Also, we could have appled formula (18.5) as we have done for the frst problem of computng correlaton coeffcent. Sometmes, a change of orgn reduces the computatonal labor to a great extent. Ths we are gong to do n the next problem. JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 684 / 808

18.13 Example 18.4: The followng data relate to the test scores obtaned by eght salesmen n an apttude test and ther daly sales n thousands of rupees: Salesman : 1 3 4 5 6 7 8 Soluton: scores : 60 55 6 56 6 64 70 54 Sales : 31 8 6 4 30 35 8 4 Let the scores and sales be denoted by x and y respectvely. We take a, orgn of x as the average of the two extreme values.e. 54 and 70. Hence a 6 smlarly, the orgn of y s taken as b 4 + 35 3 0 Table 18.4 Computaton of Correlaton Coeffcent Between Test Scores and Sales. Scores Sales n u v u v u v (x ) ` 1000 x 6 y 30 (1) (y ) () (3) (4) (5)(3)x(4) (6)(3) (7)(4) 60 31 1 4 1 55 8 7 14 49 4 6 6 0 4 0 0 16 56 4 6 6 36 36 36 6 30 0 0 0 0 0 64 35 5 10 4 5 70 8 8 16 64 4 54 4 8 6 48 64 36 Total 13 14 90 1 1 Snce correlaton coeffcent remans unchanged due to change of orgn, we have r r xy r uv n u v u v n u u n v v 8 90 ( 13) ( 14) 8 1 ( 13) 8 1 ( 14) 5 3 8 1 7 6 8 1 6 9 9 7 6 1 9 6 0.48 JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 685 / 808

18.14 STATISTICS In some cases, there may be some confuson about selectng the par of varables for whch correlaton s wanted. Ths s explaned n the followng problem. Example 18.5: Examne whether there s any correlaton between age and blndness on the bass of the followng data: Age n years : 0-10 10-0 0-30 30-40 40-50 50-60 60-70 70-80 No. of Persons (n thousands) : 90 10 140 100 80 60 40 0 No. of blnd Persons : 10 15 18 0 15 1 10 06 Soluton: Let us denote the md-value of age n years as x and the number of blnd persons per lakh as y. Then as before, we compute correlaton coeffcent between x and y. Table 18.5 Computaton of correlaton between age and blndness Age n Md-value No. of No. of No. of xy x y years x Persons blnd blnd per () (5) () (5) (1) () ( 000) B lakh (6) (7) (8) P (4) yb/p 1 lakh (3) (5) 0-10 5 90 10 11 55 5 11 10-0 15 10 15 1 180 5 144 0-30 5 140 18 13 35 65 169 30-40 35 100 0 0 700 15 400 40-50 45 80 15 19 855 05 361 50-60 55 60 1 0 1100 305 400 60-70 65 40 10 5 165 45 65 70-80 75 0 6 30 50 565 900 Total 30 150 7090 17000 310 JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 686 / 808

18.15 The correlaton coeffcent between age and blndness s gven by n xy x. y r n x ( x) n y ( y) 0.96 8.17000 (30) 870 183.3030.49.5984 8.7090 30.150 8.310 (150) whch exhbts a very hgh degree of postve correlaton between age and blndness. Example 18.6: Coeffcent of correlaton between x and y for 0 tems s 0.4. The AM s and SD s of x and y are known to be 1 and 15 and 3 and 4 respectvely. Later on, t was found that the par (0, 15) was wrongly taken as (15, 0). Fnd the correct value of the correlaton coeffcent. Soluton: We are gven that n 0 and the orgnal r 0.4, x 1, y 15, S x 3 and S y 4 r cov (x, y) cov(x, y) 0.4 S S 3 4 x y Cov (x, y) 4.8 xy x y4.8 n xy 1 154.8 0 xy 3696 Hence, corrected xy 3696 0 15 + 15 0 3696 Also, S x 9 (x / 0) 1 9 x 3060 JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 687 / 808

18.16 STATISTICS Smlarly, S y 16 S y y 15 16 0 y 480 Thus corrected x n x wrong x value + correct x value. 0 1 15 + 0 45 Smlarly correctedy 0 15 0 + 15 95 Corrected x 3060 15 + 0 335 Corrected y 480 0 + 15 4645 Thus corrected value of the correlaton coeffcent by applyng formula (18.5) 0 3696-45 95 0 335 -(45) 0 4645 -(95) 7390 775 68.3740 76.6480 0.31 Example 18.7: Compute the coeffcent of correlaton between marks n Statstcs and Mathematcs for the bvarate frequency dstrbuton shown n Table 18.6 Soluton: For the sake of computatonal advantage, we effect a change of orgn and scale for both the varable x and y. Defne u x a x 10 b 4 And v j y c y 10 d 4 Where x and y j denote respectvely the md-values of the x-class nterval and y-class nterval respectvely. The followng table shows the necessary calculaton on the rght top corner of each cell, the product of the cell frequency, correspondng u value and the respectve v value has been shown. They add up n a partcular row or column to provde the value of f j u v j for that partcular row or column. JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 688 / 808

18.17 Table 18.6 Computaton of Correlaton Coeffcent Between Marks of Mathematcs and Statstcs Class Interval 0-4 4-8 8-1 1-16 16-0 Md-value 6 10 14 18 Class Md V j f o f o u f o u f j u v j Interval -value u 1 0 1 0-4 1 4 1 0 4 8 16 6 4-8 6 1 4 4 4 5 0 1 1 1 13 13 13 5 8-1 10 0 0 4 0 6 0 1 0 13 0 0 0 1-16 14 1 1 1 3 0 5 10 11 11 11 11 16-0 18 1 0 5 10 3 1 9 18 36 f oj 3 8 15 14 10 50 5 76 44 f oj v j 6 8 0 14 0 0 f oj v j 1 8 0 14 40 74 f j u v j 8 5 0 11 0 44 CHECK A sngle formula for computng correlaton coeffcent from bvarate frequency dstrbuton s gven by r N f u v f u f v,j j j o o j j o o oj j oj j N f u f u f v f v...(18.10) 50 44 8 0 50 76 8 50 74 0 040 61.18 57.4456 0.58 The value of r shown a good amount of postve correlaton between the marks n Statstcs and Mathematcs on the bass of the gven data. JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 689 / 808

18.18 STATISTICS Example 18.8: Gven that the correlaton coeffcent between x and y s 0.8, wrte down the correlaton coeffcent between u and v where () u + 3x + 4 0 and 4v + 16y + 11 0 () u 3x + 4 0 and 4v + 16y + 11 0 () u 3x + 4 0 and 4v 16y + 11 0 (v) u + 3x + 4 0 and 4v 16y + 11 0 Soluton: Usng (18.10), we fnd that r xy bd b d ruv.e. r xy r uv f b and d are of same sgn and r uv r xy when b and d are of opposte sgns, b and d beng the scales of x and y respectvely. In (), u ( ) + (-3/) x and v ( 11/4) + ( 4)y. Snce b 3/ and d 4 are of same sgn, the correlaton coeffcent between u and v would be the same as that between x and y.e. r xy 0.8 r uv In (), u ( ) + (3/)x and v ( 11/4) + ( 4)y Hence b 3/ and d 4 are of opposte sgns and we have r uv r xy 0.8 Proceedng n a smlar manner, we have r uv 0.8 and 0.8 n () and (v). (c) SPEARMAN S RANK CORRELATION COEFFICIENT When we need fndng correlaton between two qualtatve characterstcs, say, beauty and ntellgence, we take recourse to usng rank correlaton coeffcent. Rank correlaton can also be appled to fnd the level of agreement (or dsagreement) between two judges so far as assessng a qualtatve characterstc s concerned. As compared to product moment correlaton coeffcent, rank correlaton coeffcent s easer to compute, t can also be advocated to get a frst hand mpresson about the correlaton between a par of varables. Spearman s rank correlaton coeffcent s gven by r R 1 6 d n(n... (18.11) 1) where r R denotes rank correlaton coeffcent and t les between 1 and 1 nclusve of these two values. d x y represents the dfference n ranks for the -th ndvdual and n denotes the number of ndvduals. In case u ndvduals receve the same rank, we descrbe t as a ted rank of length u. In case of a ted rank, formula (18.11) s changed to JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 690 / 808

18.19 r R 1 3 tj tj 6 d + j 1... (18.1) n n 1 In ths formula, t j represents the j th te length and the summaton (t 3 j t j) extends over the lengths of all the tes for both the seres. Example 18.9: compute the coeffcent of rank correlaton between sales and advertsement expressed n thousands of rupees from the followng data: Sales : 90 85 68 75 8 80 95 70 Advertsement : 7 6 3 4 5 8 1 Soluton: Let the rank gven to sales be denoted by x and rank of advertsement be denoted by y. We note that snce the hghest sales as gven n the data, s 95, t s to be gven rank 1, the second hghest sales 90 s to be gven rank and fnally rank 8 goes to the lowest sales, namely 68. We have gven rank to the other varable advertsement n a smlar manner. Snce there are no tes, we apply formula (16.11). Table 18.7 Computaton of Rank correlaton between Sales and Advertsement. Sales Advertsement Rank for Rank for d x y d (x ) (y ) Sales (x ) Advertsement (y ) 90 7 0 0 85 6 3 3 0 0 68 8 7 1 1 75 3 6 6 0 0 8 4 4 5 1 1 80 5 5 4 1 1 95 8 1 1 0 0 70 1 7 8 1 1 Total 0 4 j JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 691 / 808

18.0 STATISTICS Snce n 8 and d 4, applyng formula (18.11), we get. r R 1 6 d n(n 1) 1 6 4 8(8 1) 1 0.0476 0.95 The hgh postve value of the rank correlaton coeffcent ndcates that there s a very good amount of agreement between sales and advertsement. Example 18.10: Compute rank correlaton from the followng data relatng to ranks gven by two judges n a contest: Seral No. of Canddate : 1 3 4 5 6 7 8 9 10 Rank by Judge A : 10 5 6 1 3 4 7 9 8 Rank by Judge B : 5 6 9 8 7 3 4 10 1 Soluton: We drectly apply formula (18.11) as ranks are already gven. Table 18.8 Computaton of Rank Correlaton Coeffcent between the ranks gven by Judges Seral No. Rank by A (x ) Rank by B (y ) d x y d 1 10 5 5 5 5 6 1 1 3 6 9 3 9 4 1 1 1 5 8 6 36 6 3 7 4 16 7 4 3 1 1 8 7 4 3 9 9 8 10 4 10 9 1 8 64 Total 0 166 JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 69 / 808

18.1 The rank correlaton coeffcent s gven by r R 1 6 d n(n 1) 1 6 166 10(10 1) 0.006 The very low value (almost 0) ndcates that there s hardly any agreement between the ranks gven by the two Judges n the contest. Example 18.11: Compute the coeffcent of rank correlaton between Eco. marks and stats. Marks as gven below: Eco Marks : 80 56 50 48 50 6 60 Stats Marks : 90 75 75 65 65 50 65 Soluton: Ths s a case of ted ranks as more than one student share the same mark both for Economcs and Statstcs. For Eco. the student recevng 80 marks gets rank 1 one gettng 6 marks receves rank, the student wth 60 receves rank 3, student wth 56 marks gets rank 4 and snce there are two students, each gettng 50 marks, each would be recevng a common rank, the average of the next two ranks 5 and 6.e. 5 +6.e. 5.50 and lastly the last rank.. 7 goes to the student gettng the lowest Eco marks. In a smlar manner, we award ranks to the students wth stats marks. Table 18.9 Computaton of Rank Correlaton Between Eco Marks and Stats Marks wth Ted Marks Eco Mark Stats Mark Rank for Eco Rank for Stats d x y d (x ) (y ) (x ) (y ) 80 90 1 1 0 0 56 75 4.50 1.50.5 50 75 5.50.50 3 9 48 65 7 5 4 50 65 5.50 5 0.50 0.5 6 50 7 5 5 60 65 3 5 4 Total 0 44.50 JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 693 / 808

18. STATISTICS For Economcs mark there s one te of length and for stats mark, there are two tes of lengths and 3 respectvely. Thus t 3 j 1 t j 3 3 3 + + 3 3 3 1 Thus r R tj 6 d + j 1 1 n n 1 3 tj 6 (44.50 +3) 1 7(7 1) 0.15 Example 18.1: For a group of 8 students, the sum of squares of dfferences n ranks for Mathematcs and Statstcs marks was found to be 50 what s the value of rank correlaton coeffcent? Soluton: As gven n 8 and d 50. Hence the rank correlaton coeffcent between marks n Mathematcs and Statstcs s gven by 1 6 d n n 1 r R 6 50 1 8(8 1) 0.40 Example 18.13: For a number of towns, the coeffcent of rank correlaton between the people lvng below the poverty lne and ncrease of populaton s 0.50. If the sum of squares of the dfferences n ranks awarded to these factors s 8.50, fnd the number of towns. Soluton: As gven r R 0.50, d 8.50. 1 6 d n n 1 Thus r R JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 694 / 808

18.3 1 6 8.50 n n 1 0.50 n (n 1) 990 n (n 1) 10(10 1) n 10 as n must be a postve nteger. Example 18.14: Whle computng rank correlaton coeffcent between profts and nvestment for 10 years of a frm, the dfference n rank for a year was taken as 7 nstead of 5 by mstake and the value of rank correlaton coeffcent was computed as 0.80. What would be the correct value of rank correlaton coeffcent after rectfyng the mstake? Soluton: We are gven that n 10, r R 0.80 and the wrong d 7 should be replaced by 5. 1 6 d n n 1 r R 1 6 d 10 10 1 0.80 d 33 Corrected d 33 7 + 5 9 Hence rectfed value of rank correlaton coeffcent (d) 1 6 9 10 10 1 0.95 COEFFICIENT OF CONCURRENT DEVIATIONS A very smple and casual method of fndng correlaton when we are not serous about the magntude of the two varables s the applcaton of concurrent devatons. Ths method nvolves n attachng a postve sgn for a x-value (except the frst) f ths value s more than the prevous value and assgnng a negatve value f ths value s less than the prevous value. Ths s done for the y-seres as well. The devaton n the x-value and the correspondng y-value s known to be concurrent f both the devatons have the same sgn. JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 695 / 808

18.4 STATISTICS Denotng the number of concurrent devaton by c and total number of devatons as m (whch must be one less than the number of pars of x and y values), the coeffcent of concurrent devaton s gven by r C + c m...(18.13) m If (c m) >0, then we take the postve sgn both nsde and outsde the radcal sgn and f (c m) <0, we are to consder the negatve sgn both nsde and outsde the radcal sgn. Lke Pearson s correlaton coeffcent and Spearman s rank correlaton coeffcent, the coeffcent of concurrent devatons also les between 1 and 1, both nclusve. Example 18.15: Fnd the coeffcent of concurrent devatons from the followng data. Year : 1990 1991 199 1993 1994 1995 1996 1997 Prce : 5 8 30 3 35 38 39 4 Demand : 35 34 35 30 9 8 6 3 Soluton: Table 18.10 Computaton of Coeffcent of Concurrent Devatons. Year Prce Sgn of Demand Sgn of Product of devaton devaton from devaton from the the prevous (ab) prevous fgure (b) fgure (a) 1990 5 35 1991 8 + 34 199 30 + 35 + + 1993 3 30 + 1994 35 + 9 1995 38 + 8 1996 39 + 6 1997 4 + 3 In ths case, m number of pars of devatons 7 c No. of postve sgns n the product of devaton column Number of concurrent devatons JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 696 / 808

18.5 Thus r C ± ± c m m 4 7 ± ± m 3 ± ± 7 3.65 7 c m 3 (Snce we take negatve sgn both nsde and outsde of the radcal sgn) m 7 Thus there s a negatve correlaton between prce and demand. 18.5 REGRESSION ANALYSIS In regresson analyss, we are concerned wth the estmaton of one varable for a gven value of another varable (or for a gven set of values of a number of varables) on the bass of an average mathematcal relatonshp between the two varables (or a number of varables). Regresson analyss plays a very mportant role n the feld of every human actvty. A busnessman may be keen to know what would be hs estmated proft for a gven level of nvestment on the bass of the past records. Smlarly, an outgong student may lke to know her chance of gettng a frst class n the fnal Unversty Examnaton on the bass of her performance n the college selecton test. When there are two varables x and y and f y s nfluenced by x.e. f y depends on x, then we get a smple lnear regresson or smple regresson. y s known as dependent varable or regresson or explaned varable and x s known as ndependent varable or predctor or explanator. In the prevous examples snce proft depends on nvestment or performance n the Unversty Examnaton s dependent on the performance n the college selecton test, proft or performance n the Unversty Examnaton s the dependent varable and nvestment or performance n the selecton test s the In-dependent varable. In case of a smple regresson model f y depends on x, then the regresson lne of y on x n gven by y a + bx (18.14) Here a and b are two constants and they are also known as regresson parameters. Furthermore, b s also known as the regresson coeffcent of y on x and s also denoted by b yx. We may defne JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 697 / 808

18.6 STATISTICS the regresson lne of y on x as the lne of best ft obtaned by the method of least squares and used for estmatng the value of the dependent varable y for a known value of the ndependent varable x. The method of least squares nvolves n mnmzng e (y y^ ) (y a bx ). (18.15) where y demotes the actual or observed value and y^ a + b x, the estmated value of y for a gven value of x, e s the dfference between the observed value and the estmated value and e s techncally known as error or resdue. Ths summaton ntends over n pars of observatons of (x, y ). The lne of regresson of y or x and the errors of estmaton are shown n the followng fgure. FIGURE 18.7 SHOWING REGRESSION LINE OF y on x AND ERRORS OF ESTIMATION Mnmsaton of (18.15) yelds the followng equatons known as Normal Equatons. y na + bx.. (18.16) x y ax + b x..... (18.17) Solvng there two equatons for b and a, we have the least squares estmates of b and a as Cov(x, y) b S r.s x.s x y Sx JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 698 / 808

18.7 r.s S x y...(18.18) After estmatng b, estmate of a s gven by ay bx...(18.19) Substtutng the estmates of b and a n (18.14), we get y y r x x S y S x...(18.0) There may be cases when the varable x depends on y and we may take the regresson lne of x on y as x a^+ b^y Unlke the mnmzaton of vertcal dstances n the scatter dagram as shown n fgure (18.7) for obtanng the estmates of a and b, n ths case we mnmze the horzontal dstances and get the followng normal equaton n a^ and b^, the two regresson parameters : x na^ + b^y... (18.1) x y a^y + b^ y..... (18.) or solvng these equatons, we get b^ b xy cov(x, y) r.s S S y y x...(18.3) and a x - b y.... (18.4) A sngle formula for estmatng b s gven by n xy x. y b^ b yx n y ( y )...(18.5) n xy x. y Smlarly, b^ b yx n y ( y )...(18.6) The standardzed form of the regresson equaton of x on y, as n (18.0), s gven by JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 699 / 808

18.8 STATISTICS x x r S x y y S y... (18.7) Example 16.15: Fnd the two regresson equatons from the followng data: x: 4 5 5 8 10 y: 6 7 9 10 1 1 Hence estmate y when x s 13 and estmate also x when y s 15. Soluton: Table 18.11 Computaton of Regresson Equatons x y x y x y 6 1 4 36 4 7 8 16 49 5 9 45 5 81 5 10 50 5 100 8 1 96 64 144 10 1 10 100 144 34 56 351 34 554 On the bass of the above table, we have x 34 x 5.6667 n 6 y 56 y 9.3333 n 6 cov (x, y) xy x y n 351 5.6667 9.3333 6 58.50 5.8890 5.6110 x S x x n JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 700 / 808

18.9 34 (5.6667) 6 39 3.1115 6.8885 S y y y n 554 (9.3333) 6 9.3333 87.1105 5.8 The regresson lne of y on x s gven by y a + bx cov(x, y) Where b^ S x 5.6110 6.8885 0.8145 and a y b x 9.3333 0.8145 x 5.6667 4.7178 Thus the estmated regresson equaton of y on x s y 4.7178 + 0.8145x When x 13, the estmated value of y s gven by ŷ 4.7178 + 0.8145 13 15.3063 The regresson lne of x on y s gven by x a^ + b^ y Where b^ cov x, y S y 5.6110 5.8 JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 701 / 808

18.30 STATISTICS and a^ 1.0743 x b y 5.6667 1.0743 9.3333 4.3601 Thus the estmated regresson lne of x on y s x 4.3601 + 1.0743y When y 15, the estmate value of x s gven by ˆx 4.3601 + 1.0743 15 11.75 Example 18.16: Marks of 8 students n Mathematcs and statstcs are gven as: Mathematcs: 80 75 76 69 70 85 7 68 Statstcs: 85 65 7 68 67 88 80 70 Fnd the regresson lnes. When marks of a student n Mathematcs are 90, what are hs most lkely marks n statstcs? Soluton: We denote the marks n Mathematcs and Statstcs by x and y respectvely. We are to fnd the regresson equaton of y on x and also of x or y. Lastly, we are to estmate y when x 90. For computaton advantage, we shft orgns of both x and y. Table 18.1 Computaton of regresson lnes Maths Stats u v u v mark (x ) mark (y ) x 74 y 76 u v 80 85 6 9 54 36 81 75 65 1 11 11 1 11 76 7 4 8 4 16 69 68 5 8 40 5 64 70 67 4 9 36 16 81 85 88 11 1 13 11 144 7 80 4 8 4 16 68 70 6 6 36 36 36 595 595 3 13 71 43 559 JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 70 / 808

18.31 The regresson coeffcents b (or b yx ) and b (or b xy ) reman unchanged due to a shft of orgn. Applyng (18.5) and (18.6), we get n uv u. v b b yx b vu n u ( u ) 8.(71) (3).( 13) 8.(43) (3) 168 39 1944 9 1.1406 n uv u. v and b^ b xy b uv n v ( v ) 8.(71) (3).( 13) 8.(559) ( 13) 168 39 447 169 Also a^ y 0.519 b x (595) 8 1.1406 (595) 8 74.375 1.1406 74.375 10.4571 and a^ x b y 74.375 0.519 74.375 36.80 The regresson lne of y on x s y 10.4571 + 1.1406x and the regresson lne of x on y s x 36.81 + 0.519y JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 703 / 808

18.3 STATISTICS For x 90, the most lkely value of y s ŷ 10.4571 + 1.1406 x 90 9.1969 9 Example 18.17: The followng data relate to the mean and SD of the prces of two shares n a stock Exchange: Share Mean (n `) SD (n `) Company A 44 5.60 Company B 58 6.30 Coeffcent of correlaton between the share prces 0.48 Fnd the most lkely prce of share A correspondng to a prce of ` 60 of share B and also the most lkely prce of share B for a prce of ` 50 of share A. Soluton: Denotng the share prces of Company A and B respectvely by x and y, we are gven that x ` 44, y ` 58 S x ` 5.60, S y ` 6.30 and r 0.48 The regresson lne of y on x s gven by y a + bx Where b S r S y x 6.30 0.48 5.60 0.54 a y bx ` (58 0.54 44) ` 34.4 Thus the regresson lne of y on x.e. the regresson lne of prce of share B on that of share A s gven by y ` (34.4 + 0.54x) When x ` 50, ` (34.4 + 0.54 50) JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 704 / 808

18.33 ` 61.4 Agan the regresson lne of x on y s gven by x a^ + b^y The estmated prce of share B for a prce of ` 50 of share A s ` 61.4 Where b^ S r S x y 5.60 0.48 6.30 0.467 a^ x b y ` (44 0.467 58) ` 19.5 Hence the regresson lne of x on y.e. the regresson lne of prce of share A on that of share B n gven by x ` (19.5 + 0.467y) When y ` 60, ˆx ` (19.5 + 0.467 60) ` 44.85 Example 18.18: The followng data relate the expendture or advertsement n thousands of rupees and the correspondng sales n lakhs of rupees. Expendture on Ad : 8 10 10 1 15 Sales : 18 0 5 8 Fnd an approprate regresson equaton. Soluton: Snce sales (y) depend on advertsement (x), the approprate regresson equaton s of y on x.e. of sales on advertsement. We have, on the bass of the gven data, n 5, x 8+10+10+1+15 55 y 18+0++5+8 113 xy 8 18+10 0+10 +1 5+15 8 184 x 8 +10 +10 +1 +15 633 b n y x y n x x JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 705 / 808

18.34 STATISTICS 5 18455 113 5 633 55 05 140 1.4643 a y bx 113 55 1.4643 5 5.60 16.1073 6.497 Thus, the regresson lne of y or x.e. the regresson lne of sales on advertsement s gven by y 6.497 + 1.4643x 18.6 PROPERTIES OF REGRESSION LINES We consder the followng mportant propertes of regresson lnes: () The regresson coeffcents reman unchanged due to a shft of orgn but change due to a shft of scale. Ths property states that f the orgnal par of varables s (x, y) and f they are changed to the par (u, v) where xa yc u and v p q b yx q b vu p. (18.8) and bxy p b uv q (18.9) () The two lnes of regresson ntersect at the pont x,y, where x and y are the varables under consderaton. Accordng to ths property, the pont of ntersecton of the regresson lne of y on x and the regresson lne of x on y s x,y.e. the soluton of the smultaneous equatons n x and y. JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 706 / 808

18.35 () The coeffcent of correlaton between two varables x and y n the smple geometrc mean of the two regresson coeffcents. The sgn of the correlaton coeffcent would be the common sgn of the two regresson coeffcents. Ths property says that f the two regresson coeffcents are denoted by b yx (b) and b xy (b ) then the coeffcent of correlaton s gven by r ± b b.. (18.30) yx xy If both the regresson coeffcents are negatve, r would be negatve and f both are postve, r would assume a postve value. Example 18.19: If the relatonshp between two varables x and u s u + 3x 10 and between two other varables y and v s y + 5v 5, and the regresson coeffcent of y on x s known as 0.80, what would be the regresson coeffcent of v on u? Soluton: u + 3x 10 x10/3 u 1/3 and y + 5v 5 From v y5/ 5/ (16.8), we have q b yx b p vu or, 5/ 0.80 b 1/3 15 0.80 b vu vu 8 b vu 0.80 15 75 Example 18.0: For the varables x and y, the regresson equatons are gven as 7x 3y 18 0 and 4x y 11 0 () Fnd the arthmetc means of x and y. () Identfy the regresson equaton of y on x. JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 707 / 808

18.36 STATISTICS () Compute the correlaton coeffcent between x and y. (v) Gven the varance of x s 9, fnd the SD of y. Soluton: () Snce the two lnes of regresson ntersect at the pont (x, y), replacng x and y by x and y respectvely n the gven regresson equatons, we get 7 x 3y 180 and 4 x y 110 Solvng these two equatons, we get x 3 and y 1 Thus the arthmetc means of x and y are gven by 3 and 1 respectvely. () Let us assume that 7x 3y 18 0 represents the regresson lne of y on x and 4x y 11 0 represents the regresson lne of x on y. Now 7x 3y 18 0 7 y 6 + x 3 7 b yx 3 Agan 4x y 11 0 11 1 1 x + y b xy 4 4 4 Thus r b yx b xy 7 1 3 4 7 < 1 1 Snce r 1 r 1, our assumptons are correct. Thus, 7x 3y 18 0 truly represents the regresson lne of y on x. () Snce r 7 1 JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 708 / 808

18.37 r 7 1 (We take the sgn of r as postve snce both the regresson coeffcents are postve) 0.7638 (v) b yx S r S y x 7 3 S y 0.7638 3 ( S x 9 as gven) S y 7 0.7638 9.1647 18.7 PROBABLE ERROR The correlaton coeffcent calculated from the sample of n pars of value from large populaton. It s possble to determne the lmts of the correlaton coeffcent of populaton and whch coeffcent of correlatonof correlaton of the populaton wll le from the knowledge of sample correlaton coeffcent. Probable Error s a method of obtanng correlaton coeffcent of populaton. It s defned as: 1 r P.E 0.674 N Where r Correlaton coeffcent fromn pars of sample observatons PE 3 SE When SE Standard Error of correlaton coeffcent 1 r S.E N The lmt of the correlaton coeffcent s gven by p r ± P.E Where p Correlaton coeffcent of the populaton The followng are the assumpton whle probable Errors are sgnfcant. () If r< PE there s no evdence of correlaton () If the value of r s more than 6 tmes of the probable error, then the presence of correlaton coeffcent s certan () Snce r les between -1 and +1 (-1 < r < 1) the probable error s never negatve. JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 709 / 808

18.38 STATISTICS Note: The formula PE s valued onlyf (1) The sample chooses to fnd r s a sample random sample () the populaton s normal. Example 18.1: Compute the Probable Error assumng the correlaton coeffcent of 0.8 from a sampleof 5 pars of tems. Soluton: r 0.8,n 5 P.E. 0.6745 0.6745 0.07 0.0486 Example 18.: If r 0.7 ; and n 64 fnd out the probable error of the coeffcent of correlatonand determne the lmts for the populaton correlaton coeffcent: Soluton: r 0.7 ; n 64 1- (0.7) Probable Error (P.E.) 0.6745 64 (0.6745) (0.06375) 0.043 Lmts for the populaton correlaton coeffcent (0.7 ± 0.043).e. (0.743, 0.657) 18.8 REVIEW OF ANALYSIS So far we have dscussed the dfferent measures of correlaton and also how to ft regresson lnes applyng the method of Least Squares. It s obvous that we take recourse to correlaton analyss when we are keen to know whether two varables under study are assocated or correlated and f correlated, what s the strength of correlaton. The best measure of correlaton s provded by Pearson s correlaton coeffcent. However, one severe lmtaton of ths correlaton coeffcent, as we have already dscussed, s that t s applcable only n case of a lnear relatonshp between the two varables. If two varables x and y are ndependent or uncorrelated then obvously the correlaton coeffcent between x and y s zero. However, the converse of ths statement s not necessarly true.e. f the correlaton coeffcent, due to Pearson, between two varables comes out to be zero, then we cannot conclude that the two varables are ndependent. All that we can conclude s that no lnear relatonshp exsts between the two varables. Ths, however, does not rule out the exstence of some non lnear relatonshp between the two varables. For example, f we consder the followng pars of values on two varables x and y. (, 4), ( 1, 1), (0, 0), (1, 1) and (, 4), then cov (x, y) ( + 4) + ( 1+1) + (0 0) + (1 1) + ( 4) 0 as x 0 JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 710 / 808

18.39 Thus r xy 0 Ths does not mean that x and y are ndependent. In fact the relatonshp between x and y s y x. Thus t s always wser to draw a scatter dagram before reachng concluson about the exstence of correlaton between a par of varables. There are some cases when we may fnd a correlaton between two varables although the two varables are not causally related. Ths s due to the exstence of a thrd varable whch s related to both the varables under consderaton. Such a correlaton s known as spurous correlaton or non-sense correlaton. As an example, there could be a postve correlaton between producton of rce and that of ron n Inda for the last twenty years due to the effect of a thrd varable tme on both these varables. It s necessary to elmnate the nfluence of the thrd varable before computng correlaton between the two orgnal varables. Correlaton coeffcent measurng a lnear relatonshp between the two varables ndcates the amount of varaton of one varable accounted for by the other varable. A better measure for ths purpose s provded by the square of the correlaton coeffcent, Known as coeffcent of determnaton. Ths can be nterpreted as the rato between the explaned varance to total varance.e. Explaned varance r Total varance Thus a value of 0.6 for r ndcates that (0.6) 100% or 36 per cent of the varaton has been accounted for by the factor under consderaton and the remanng 64 per cent varaton s due to other factors. The coeffcent of non-determnaton s gven by (1 r ) and can be nterpreted as the rato of unexplaned varance to the total varance. Coeffcent of non-determnaton (1 r ) Regresson analyss, as we have already seen, s concerned wth establshng a functonal relatonshp between two varables and usng ths relatonshp for makng future projecton. Ths can be appled, unlke correlaton for any type of relatonshp lnear as well as curvlnear. The two lnes of regresson concde.e. become dentcal when r 1 or 1 or n other words, there s a perfect negatve or postve correlaton between the two varables under dscusson. If r 0 Regresson lnes are perpendcular to each other. SUMMARY? The change n one varable s recprocated by a correspondng change n the other varable ether drectly or nversely, then the two varables are known to be assocated or correlated. There are two types of correlaton. () Postve correlaton () Negatve correlaton? We consder the followng measures of correlaton: JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 711 / 808

18.40 STATISTICS (a) Scatter dagram: Ths s a smple dagrammatc method to establsh correlaton between a par of varables. (b) Karl Pearson s Product moment correlaton coeffcent: Cov(x,y) r r xy S S x y A sngle formula for computng correlaton coeffcent s gven by r n xy x y n x x n y () The Coeffcent of Correlaton s a unt-free measure. () The coeffcent of correlaton remans nvarant under a change of orgn and/or scale of the varables under consderaton dependng on the sgn of scale factors. () The coeffcent of correlaton always les between 1 and 1, ncludng both the lmtng values.e. 1 < r < + 1 (c) Spearman s rank correlaton co-effcent: Spearman s rank correlaton coeffcent s gven by 6 d r R 1 n(n, where rr denotes rank correlaton coeffcent and t les between 1) 1 and 1 nclusve of these two values.d x y represents the dfference n ranks for the -th ndvdual and n denotes the number of ndvduals. In case u ndvduals receve the same rank, we descrbe t as a ted rank of length u. In case of a ted rank, r R 1 6 d n(n 1) j tj 3 t 1 j In ths formula, t j represents the j th te length and the summaton extends over the lengths of all the tes for both the seres. (d) Co-effcent of concurrent devatons: The coeffcent of concurrent devaton s gven by r C c m m If (c m) >0, then we take the postve sgn both nsde and outsde the radcal sgn and f (c m) <0, we are to consder the negatve sgn both nsde and outsde the radcal sgn. JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 71 / 808

18.41 In regresson analyss, we are concerned wth the estmaton of one varable for gven value of another varable (or for a gven set of values of a number of varables) on the bass of an average mathematcal relatonshp between the two varables (or a number of varables). In case of a smple regresson model f y depends on x, then the regresson lne of y on x n gven by y a + b, here a and b are two constants and they are also known as regresson parameters. Furthermore, b s also known as the regresson coeffcent of y on x and s also denoted by b yx The method of least squares s solvng the equatons of regresson lnes The normal equatons are y na + bx x y ax + bx Solvng the normal equatons b cov(x y ) x S r.s.s x y Sx The regresson coeffcents reman unchanged due to a shft of orgn but change due to a shft of scale. Ths property states that f the orgnal par of varables s (x, y) and f they are changed to the par (u, v) where x a y c u and v p q b yx p b q b p vu q and bxy uv The two lnes of regresson ntersect at the pont, where x and y are the varables under consderaton. Accordng to ths property, the pont of ntersecton of the regresson lne of y on x and the regresson lne of x on y s.e. the soluton of the smultaneous equatons n x and y. The coeffcent of correlaton between two varables x and y n the smple geometrc mean of the two regresson coeffcents. The sgn of the correlaton coeffcent would be the common sgn of the two regresson coeffcents. r b b yx xy Correlaton coeffcent measurng a lnear relatonshp between the two varables ndcates the amount of varaton of one varable accounted for by the other varable. A better measure for ths purpose s provded by the square of the correlaton coeffcent, Known JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 713 / 808

18.4 STATISTICS as coeffcent of determnaton. Ths can be nterpreted as the rato between the explaned varance to total varance.e. Explaned varance r Total varance The coeffcent of non-determnaton s gven by (1 r ) and can be nterpreted as the rato of unexplaned varance to the total varance. The two lnes of regresson concde.e. become dentcal when r 1 or 1 or n other words, there s a perfect negatve or postve correlaton between the two varables under dscusson. If r 0 Regresson lnes are perpendcular to each other. EXERCISE Set A Wrte the correct answers. Each queston carres 1 mark. 1. Bvarate Data are the data collected for (a) Two varables (b) More than two varables (c) Two varables at the same pont of tme (d) Two varables at dfferent ponts of tme.. For a bvarate frequency table havng (p + q) classfcaton the total number of cells s (a) p (b) p + q (c) q (d) pq 3. Some of the cell frequences n a bvarate frequency table may be (a) Negatve (b) Zero (c) a or b (d) Non of these 4. For a p x q bvarate frequency table, the maxmum number of margnal dstrbutons s (a) p (b) p + q (c) 1 (d) 5. For a p x q classfcaton of bvarate data, the maxmum number of condtonal dstrbutons s (a) p (b) p + q (c) pq (d) p or q 6. Correlaton analyss ams at (a) Predctng one varable for a gven value of the other varable (b) Establshng relaton between two varables JSNR_5170389_ICAI_Busness Mathematcs_Logcal Reasonng & Statstce_Text.pdf 714 / 808