Correlation and Regression

Similar documents
Regression, Inference, and Model Building

11 Correlation and Regression

1 Inferential Methods for Correlation and Regression Analysis

Linear Regression Models

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

Chapter 12 Correlation

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

STP 226 ELEMENTARY STATISTICS

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

Sample Size Determination (Two or More Samples)

Correlation Regression

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Important Formulas. Expectation: E (X) = Σ [X P(X)] = n p q σ = n p q. P(X) = n! X1! X 2! X 3! X k! p X. Chapter 6 The Normal Distribution.

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Mathematical Notation Math Introduction to Applied Statistics

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

TIME SERIES AND REGRESSION APPLIED TO HOUSING PRICE

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

Successful HE applicants. Information sheet A Number of applicants. Gender Applicants Accepts Applicants Accepts. Age. Domicile

a b c d e f g h Supplementary Information

Properties and Hypothesis Testing

INSTRUCTIONS (A) 1.22 (B) 0.74 (C) 4.93 (D) 1.18 (E) 2.43

Least-Squares Regression

DISTRIBUTION LAW Okunev I.V.

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

Chapter 13, Part A Analysis of Variance and Experimental Design

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Simple Regression Model

Agenda: Recap. Lecture. Chapter 12. Homework. Chapt 12 #1, 2, 3 SAS Problems 3 & 4 by hand. Marquette University MATH 4740/MSCS 5740

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

UNIT 11 MULTIPLE LINEAR REGRESSION

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

(all terms are scalars).the minimization is clearer in sum notation:

Regression and Correlation

Chapter 1 (Definitions)

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

4 Multidimensional quantitative data

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

Simple Linear Regression

Paired Data and Linear Correlation

Lesson 11: Simple Linear Regression

Part 2: statistics Exam Contents: A. Basic concepts of descriptive statistics. B. Statistics of grouped variables

Common Large/Small Sample Tests 1/55

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Simple Linear Regression

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

Probability and statistics: basic terms

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

University of California, Los Angeles Department of Statistics. Simple regression analysis

Bivariate Sample Statistics Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 7

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Understanding Samples

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

ECON 3150/4150, Spring term Lecture 3

A proposed discrete distribution for the statistical modeling of

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

5. Likelihood Ratio Tests

n but for a small sample of the population, the mean is defined as: n 2. For a lognormal distribution, the median equals the mean.

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

Regression. Correlation vs. regression. The parameters of linear regression. Regression assumes... Random sample. Y = α + β X.

Joint Probability Distributions and Random Samples. Jointly Distributed Random Variables. Chapter { }


Regression and correlation

Final Examination Solutions 17/6/2010

STP 226 EXAMPLE EXAM #1

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls

Random Variables, Sampling and Estimation

Lecture 1, Jan 19. i=1 p i = 1.

Test of Statistics - Prof. M. Romanazzi

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 1 of 13 Chapter 1: Introduction to Statistics

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS

Formulas and Tables for Gerstman

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

MCT242: Electronic Instrumentation Lecture 2: Instrumentation Definitions

Stat 139 Homework 7 Solutions, Fall 2015

Correlation and Covariance

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

A statistical method to determine sample size to estimate characteristic value of soil parameters

MidtermII Review. Sta Fall Office Hours Wednesday 12:30-2:30pm Watch linear regression videos before lab on Thursday

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

DAWSON COLLEGE DEPARTMENT OF MATHEMATICS 201-BZS-05 PROBABILITY AND STATISTICS FALL 2015 FINAL EXAM

Frequency Domain Filtering

Lecture 11 Simple Linear Regression

Confidence Interval for Standard Deviation of Normal Distribution with Known Coefficients of Variation

Describing the Relation between Two Variables

Pearson Edexcel Level 3 Advanced Subsidiary and Advanced GCE in Statistics

The standard deviation of the mean

Transcription:

Correlatio ad Regressio Lecturer, Departmet of Agroomy Sher-e-Bagla Agricultural Uiversity Correlatio Whe there is a relatioship betwee quatitative measures betwee two sets of pheomea, the appropriate statistical tool for discoverig ad measurig the relatioship ad expressig it o a precise way is kow as correlatio. I a experimet, if the chages of oe variable affect the chages of the other variable, the the variables are said to be correlated. E.g. yield ad tiller umber, legth of paicles ad grais/paicles, etc. If the measuremets of the variables are i the same directio, the the variables are said to be directly correlated or positively correlated, e.g. test grai weight ad grai size, leaf area ad leaf size. If the movemets of variables are i opposite directio, the the variables are said to be egatively correlated, e.g. yield ad pest icidece, germiatio percetage ad ageig of seeds. To kow the directio of movemet of the variables, scattered diagram method is used. Whe two variables are ivolved, the correlatio is termed as simple correlatio. If more tha two variables are ivolved, the correlatio is kow as multiple correlatio. To measure the degree of relatioship betwee the correlated variables x ad y the followig formula is used: r xy Cov ( xy) Var( x). Var( y) SS( x). SS( y) x x y y x x y y xy - xy x y y x

Example: Data of Number of siliquae per plat ad seed yield of mustard i a field experimet are as follows: Here umber of siliquae is idepedet variable is deoted by x ad seed yield is depedet variable is deoted as y. We ca calculate the correlatio coefficiet maually by followig way- No. of siliquae plat - (x) Calculatio: r Seed yield (t ha - ) (y) x y xy 30 0.5 900 0.5 5 40 0.7 600 0.49 8 50 0.9 500 0.8 45 60. 3600. 66 70.3 4900.69 9 80.5 6400.5 0 90.7 800.89 53 00.9 0000 3.6 90 0. 00 4.4 3 0.5 4400 6.5 300 x 750 y 4. x 64500 y 3.86 xy 39 xy - xy x y y x 750 4. 39-0 750 4. 64500-3.86-0 0 39-065 64500-56503.86-0.64 850 3.696

3049.6 0.996 Test of sigificace of Correlatio coefficiet Let us suppose that r be the correlatio coefficiet from a sample of size from a bivariate ormal populatio. We are to test ull hypothesis that the populatio correlatio coefficiet is zero, i.e. H 0 : 0 The required test statistic is r t, which is distributes as t with (-) d. f r If the calculated value of t with (-) d. f is see to be smaller tha the tabulated value of t with same d. f. at 5% levels of sigificace the the calculated value of t is isigificat ad the hypothesis may be accepted. O the other had, if the calculated value of t is greater tha the tabulated value of t tha the hypothesis is rejected ad the calculated value of t is sigificat. Regressio Regressio ca be defied as a method that estimates the value of oe variable whe that of other variable is kow, provided the variables are correlated. Regressio is a simple ad more useful approach to the study of simultaeous variatio of two (or more) characters. I a field experimetatio, we may examie the relatioship betwee levels of itroge ad crop yield. The levels of N may be fixed arbitrarily ad may ot have ay distributio. Uder such coditio, the method of regressio is more appropriate tha correlatio. Of the two variables, oe is kow as idepedet variable, deoted by x. The other variable is called depedet variable, deoted by y. The uderlyig relatio betwee x ad y i a bivariate populatio ca be expressed as a fuctio. Such fuctioal relatioship betwee two variables is termed as regressio. (i) Regressio lie of y o x is y a + bx (ii) Regressio lie of x o y is x a + by Regressio coefficiet, b yx SS( x) x x y y x x

Regressio coefficiet, b xy SS( y) I case of two set of data, r bxy byx x x y y y y Necessity of regressio The regressio is used for the followig purpose: (i) To predict the average relatioship betwee the depedet variable ad idepedet variables. (ii) To determie the cotributio of each idepedet variable o the depedet variables. (iii) To estimate the value of depedet variable for a give value of idepedet variables. A agriculturist may be iterested to study the depedece of yield of mustard o irrigatio, plat spacig, fertilizers etc. Such a aalysis may eable the estimate the average yield of mustard o the basis of iformatio about the idepedet variables. Coefficiet of determiatio Coefficiet of determiatio is the ratio of explaied variatio ad Total variatio. Explaied Variatio R Total Variatio R ( y y) i ( y y) i or, Proportio of total variatio explaied by regressio. R-squared value A umber from 0 to that reveals how closely the estimated values for the tredlie correspod to your actual data. A tredlie is most reliable whe its R-squared value is at or ear. Also kow as the coefficiet of determiatio.

Seed yield (t ha - ) Correlatio ad Regressio 3.0.5 y 0.0x - 0.68 R 0.999**.0.5.0 0.5 0.0 0 50 00 50 No. of Siliquae plat - Fig. Relatioship betwee siliquae plat - ad seed yield of mustard Utility of coefficiet of determiatio The coefficiet of variatio is useful i regressio aalysis to ispect the degree of liear correlatio (r) betwee variables, whether the variables are depedet or idepedet. Rage of coefficiet of determiatio The rage of coefficiet of determiatio lies betwee 0 ad +, symbolically 0 R.