Correlatio ad Regressio Lecturer, Departmet of Agroomy Sher-e-Bagla Agricultural Uiversity Correlatio Whe there is a relatioship betwee quatitative measures betwee two sets of pheomea, the appropriate statistical tool for discoverig ad measurig the relatioship ad expressig it o a precise way is kow as correlatio. I a experimet, if the chages of oe variable affect the chages of the other variable, the the variables are said to be correlated. E.g. yield ad tiller umber, legth of paicles ad grais/paicles, etc. If the measuremets of the variables are i the same directio, the the variables are said to be directly correlated or positively correlated, e.g. test grai weight ad grai size, leaf area ad leaf size. If the movemets of variables are i opposite directio, the the variables are said to be egatively correlated, e.g. yield ad pest icidece, germiatio percetage ad ageig of seeds. To kow the directio of movemet of the variables, scattered diagram method is used. Whe two variables are ivolved, the correlatio is termed as simple correlatio. If more tha two variables are ivolved, the correlatio is kow as multiple correlatio. To measure the degree of relatioship betwee the correlated variables x ad y the followig formula is used: r xy Cov ( xy) Var( x). Var( y) SS( x). SS( y) x x y y x x y y xy - xy x y y x
Example: Data of Number of siliquae per plat ad seed yield of mustard i a field experimet are as follows: Here umber of siliquae is idepedet variable is deoted by x ad seed yield is depedet variable is deoted as y. We ca calculate the correlatio coefficiet maually by followig way- No. of siliquae plat - (x) Calculatio: r Seed yield (t ha - ) (y) x y xy 30 0.5 900 0.5 5 40 0.7 600 0.49 8 50 0.9 500 0.8 45 60. 3600. 66 70.3 4900.69 9 80.5 6400.5 0 90.7 800.89 53 00.9 0000 3.6 90 0. 00 4.4 3 0.5 4400 6.5 300 x 750 y 4. x 64500 y 3.86 xy 39 xy - xy x y y x 750 4. 39-0 750 4. 64500-3.86-0 0 39-065 64500-56503.86-0.64 850 3.696
3049.6 0.996 Test of sigificace of Correlatio coefficiet Let us suppose that r be the correlatio coefficiet from a sample of size from a bivariate ormal populatio. We are to test ull hypothesis that the populatio correlatio coefficiet is zero, i.e. H 0 : 0 The required test statistic is r t, which is distributes as t with (-) d. f r If the calculated value of t with (-) d. f is see to be smaller tha the tabulated value of t with same d. f. at 5% levels of sigificace the the calculated value of t is isigificat ad the hypothesis may be accepted. O the other had, if the calculated value of t is greater tha the tabulated value of t tha the hypothesis is rejected ad the calculated value of t is sigificat. Regressio Regressio ca be defied as a method that estimates the value of oe variable whe that of other variable is kow, provided the variables are correlated. Regressio is a simple ad more useful approach to the study of simultaeous variatio of two (or more) characters. I a field experimetatio, we may examie the relatioship betwee levels of itroge ad crop yield. The levels of N may be fixed arbitrarily ad may ot have ay distributio. Uder such coditio, the method of regressio is more appropriate tha correlatio. Of the two variables, oe is kow as idepedet variable, deoted by x. The other variable is called depedet variable, deoted by y. The uderlyig relatio betwee x ad y i a bivariate populatio ca be expressed as a fuctio. Such fuctioal relatioship betwee two variables is termed as regressio. (i) Regressio lie of y o x is y a + bx (ii) Regressio lie of x o y is x a + by Regressio coefficiet, b yx SS( x) x x y y x x
Regressio coefficiet, b xy SS( y) I case of two set of data, r bxy byx x x y y y y Necessity of regressio The regressio is used for the followig purpose: (i) To predict the average relatioship betwee the depedet variable ad idepedet variables. (ii) To determie the cotributio of each idepedet variable o the depedet variables. (iii) To estimate the value of depedet variable for a give value of idepedet variables. A agriculturist may be iterested to study the depedece of yield of mustard o irrigatio, plat spacig, fertilizers etc. Such a aalysis may eable the estimate the average yield of mustard o the basis of iformatio about the idepedet variables. Coefficiet of determiatio Coefficiet of determiatio is the ratio of explaied variatio ad Total variatio. Explaied Variatio R Total Variatio R ( y y) i ( y y) i or, Proportio of total variatio explaied by regressio. R-squared value A umber from 0 to that reveals how closely the estimated values for the tredlie correspod to your actual data. A tredlie is most reliable whe its R-squared value is at or ear. Also kow as the coefficiet of determiatio.
Seed yield (t ha - ) Correlatio ad Regressio 3.0.5 y 0.0x - 0.68 R 0.999**.0.5.0 0.5 0.0 0 50 00 50 No. of Siliquae plat - Fig. Relatioship betwee siliquae plat - ad seed yield of mustard Utility of coefficiet of determiatio The coefficiet of variatio is useful i regressio aalysis to ispect the degree of liear correlatio (r) betwee variables, whether the variables are depedet or idepedet. Rage of coefficiet of determiatio The rage of coefficiet of determiatio lies betwee 0 ad +, symbolically 0 R.