SALES AND MARKETING Department MATHEMATICS. 2nd Semester. Bivariate statistics LESSONS

Similar documents
1 Inferential Methods for Correlation and Regression Analysis

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

IUT of Saint-Etienne Sales and Marketing department Mr Ferraris Prom /12/2015

11 Correlation and Regression

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Random Variables, Sampling and Estimation

Properties and Hypothesis Testing

Mathematical Notation Math Introduction to Applied Statistics

Polynomial Functions and Their Graphs

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Expectation and Variance of a random variable

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

ECON 3150/4150, Spring term Lecture 3

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

Quadratic Functions. Before we start looking at polynomials, we should know some common terminology.

Statistics 511 Additional Materials

Topic 9: Sampling Distributions of Estimators

A statistical method to determine sample size to estimate characteristic value of soil parameters

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Problem Set 4 Due Oct, 12

Topic 18: Composite Hypotheses

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

Linear Regression Analysis. Analysis of paired data and using a given value of one variable to predict the value of the other

Final Examination Solutions 17/6/2010

Linear Regression Models

Stat 139 Homework 7 Solutions, Fall 2015

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

Statistical inference: example 1. Inferential Statistics

INSTRUCTIONS (A) 1.22 (B) 0.74 (C) 4.93 (D) 1.18 (E) 2.43

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

Correlation and Covariance

10-701/ Machine Learning Mid-term Exam Solution

(all terms are scalars).the minimization is clearer in sum notation:

Linear Regression Demystified

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Chapter 6 Principles of Data Reduction

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Section 14. Simple linear regression.

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Topic 9: Sampling Distributions of Estimators

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Zeros of Polynomials

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

Simple Linear Regression

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Common Large/Small Sample Tests 1/55

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Frequentist Inference

Stat 200 -Testing Summary Page 1

Stat 319 Theory of Statistics (2) Exercises

Regression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor.

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Lecture 7: Properties of Random Samples

Correlation Regression

Ismor Fischer, 1/11/

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Parameter, Statistic and Random Samples

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Power and Type II Error

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Topic 10: Introduction to Estimation

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

1 Models for Matched Pairs

Statistics 20: Final Exam Solutions Summer Session 2007

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Data Analysis and Statistical Methods Statistics 651

Sample Size Determination (Two or More Samples)

Topic 9: Sampling Distributions of Estimators

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

3.2 Properties of Division 3.3 Zeros of Polynomials 3.4 Complex and Rational Zeros of Polynomials

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Chapter 5: Hypothesis testing

GG313 GEOLOGICAL DATA ANALYSIS

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Describing the Relation between Two Variables

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Regression and Correlation

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

MATH/STAT 352: Lecture 15

Ma 530 Introduction to Power Series

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Analytic Theory of Probabilities

Transcription:

SALES AND MARKETING Departmet MATHEMATICS d Semester Bivariate statistics LESSONS Olie documet: http://jff-dut-tc.weebly.com sectio DUT Maths S. IUT de Sait-Etiee Départemet TC J.F.Ferraris Math S StatVar Lessos Rev08 page / 4

TABLE OF CONTENTS LESSONS 3 Itroductio, vocabulary 3 - Aims 3 - Formattig 3-3 Scatter plot 4 Chi-square idepedece testig 5 3 Fittig: Mayer s method ad movig meas 6 3- Movig meas 6 3- Purpose of liear fittig 7 3-3 Mayer s method 7 4 Liear fittig: least square method 8 4- Parameters of a bivariate series 8 4- Least square method 9 4-3 Liear correlatio coefficiet 0 5 No-liear fittig: variable chage 6 Statistical predictio 3 6- Poit estimate 3 6- Cofidece iterval 3 IUT de Sait-Etiee Départemet TC J.F.Ferraris Math S StatVar Lessos Rev08 page / 4

LESSONS Itroductio, vocabulary. Aims Two characters will be simultaeously studied o each idividual of a -sized populatio, creatig two variables (lists of values) X ad Y. Aims : * study the relatioship betwee both characters: their correlatio; * model this correlatio by a mathematical fuctio: regressio; * use this model to perform a predictio, with a associated cofidece level; * test the hypothesis that X ad Y are ot related.. Formattig From oe idividual (# i), a observatio will be writte dow as a ordered pair of values (x i ; y i ). There are two possible ways to display the data series, followig the situatio: * data series give i lists e.g.: relatioship betwee the quatity of spread fertilizer ad the harvested productio fertilizer harvest plot # X (kg.ha - ) Y (q.ha - ) 50 46 80 37 3 0 46 4 0 5 5 00 43 example of a time series: aual advertisig expese of a compay X : year 006 007 008 009 00 0 0 03 04 05 06 07 Y : expese 4 60 55 66 87 6 90 95 8 0 5 8 * series frequecies: cotigecy table e.g.: relatioship betwee age ad visual acuity, data collected from 00 people X : age 0 40 50 60 Y : acuity 3/0 5 0 0 6/0 8 5 8 9/0 55 6 4 6 IUT de Sait-Etiee Départemet TC J.F.Ferraris Math S StatVar Lessos Rev08 page 3 / 4

.3 Scatter plot Every statistical data series ca be displayed o a graph by a poits cloud, each variable takig place o its ow axis. * series i lists: a pair (x i ; y i ) correspods to oe idividual ad to oe poit. secod example i the previous page: year : 006) * series with cotigecy: a pair (x i ; y i ) mostly correspods to more tha oe idividual (freq ) ad to a object whose size is a icreasig fuctio of the associated frequecy. third example i the previous page: acuity age IUT de Sait-Etiee Départemet TC J.F.Ferraris Math S StatVar Lessos Rev08 page 4 / 4

Chi-square idepedece testig The goal of a statistical testig is to decide whether we ca afford to reject (or whether we ca t) a give hypothesis made o a populatio, by the aalysis of a sample. Firstly, this hypothesis is worded ad is amed "ull hypothesis", H 0. If the coclusio/decisio of the test is a rejectio of H 0, the there is automatically a risk to be wrog, whose probability is amed "sigificace level" of the test, oted α. The special case of a Chi-square idepedece testig: Here, a survey is crossig two variables (e.g. i the ext tutorial: geder ad behaviour towards tobacco), whose level of depedece has to be estimated i a populatio, oly aalysig the distributio of people got from a sample. I case of idepedece, the ideal distributio cosists of a proportio cotigecy table: people s aswers are supposed to be distributed i proportio to the existig subtotals (margi frequecies). The test aims to compare the observed frequecies (obs) to these theoretical proportioal frequecies (th) associated to perfect idepedece, to extract a value, "χ²" (proouce Chi-square), umberig a "distace betwee observatios ad perfect idepedece" foud i the studied sample, ad fially to decide whether this gap is large or ot. Methodology: observatios are coducted : idividuals are evaluated o two variables X ad Y. The variable X shows as results r differet values, ad Y shows k differet values. The ull hypothesis H 0 is by covetio: the variables are idepedet. Le test compares reality to what would perfect idepedece have show. We ca reject H 0 i case the set of observatios is «too far» from the theoretical distributio.. Calculatio of the observed χ² * table of observatios o idividuals Y Y Y k total X X obs obs obs k total X X obs obs obs k total X X r obs r obs r obs rk total X r total Y total Y total Y total Y k * table of the theoretical distributio (idepedece) This secod table is built from the first, takig back every subtotal, the calculatig each frequecy i proportio to these subtotals ad to the geeral total. * calculatio of χ² calc (global differece betwee obs ad th): χ² calc = table ( obs th) th. Rejectio area The χ² variable expresses the ifiity of the possible values that could be obtaied from ay possible sample compared to idepedece. This variable is distributed i probability, by a law of the same ame, settled by its umber of degrees of freedom (dof). dof = (r- )(k - ) To each possible χ² value ([0; [) correspods a probability "α" that a sample - extracted from a ideal idepedece populatio would exceed it. I a exercise, i case α is give, we the read the value of χ² lim. 3. Compariso ad decisio If χ² calc > χ² lim, the we re allowed to reject H 0 (the idepedece), with a risk α to be wrog. IUT de Sait-Etiee Départemet TC J.F.Ferraris Math S StatVar Lessos Rev08 page 5 / 4

3 Fittig: Mayer s method ad movig meas 3. Movig meas Movig meas are mostly used with time series, the variable X represetig time ad the variable Y a value that evolves i time. Whe Y values show large oscillatios through time, a global upward or dowward tred is hard to detect. Movig meas are there to help us givig a aswer, by smoothig these oscillatios. Methodology: * Cosolidate sets of successive Y values, always of the same umber (e.g.: take two values by two or three by three, or four by four, etc.); * The ext set cosists of the previous oe, i which the first value of Y is removed ad the ext oe is joied (slippery sets); * The average value of Y is calculated i each set (providig a list of movig meas), same for the average value of X (providig a average locatio i time for each set); * The correspodig poits may be plotted. e.g.: X (trimesters) 3 4 5 6 7 8 Y (thousads of tourists) 58 3 36 60 9 4 33 Let s create the list of the 4 4 movig meas: X.5 3.5 4.5 5.5 6.5 Y 3.5 3.75 3.5 3.5 3.75 This ew list of values (doubled by its graph) suggests a very slight dowward tred. ote: * the first movig mea is the mea of the values #,, 3 ad 4. (34)/4 =.5 for X ad (58336)/4 = 3.5 for Y * the secod movig mea is the mea of the values #, 3, 4 ad 5. (345)/4 = 3.5 for X ad (33660)/4 = 3.75 for Y * ad so o IUT de Sait-Etiee Départemet TC J.F.Ferraris Math S StatVar Lessos Rev08 page 6 / 4

3. Purpose of liear fittig A poit cloud may show a lik betwee both variables if its poits appear ot to be scattered at radom. I some cases, this cloud's shape may be elogated, quite thi, with a "directioal axis" quite right showig a tedecy... Ca we fid a axis, a straight lie which "follows" the whole cloud "to the best"? Let s say this lie has already bee draw (D) : y = ax b. To a give value x i are associated the value y i (ordiate of the poit M i i the cloud) ad the value y ˆi = ax i b (o the lie). y i yˆi defiitio: we ame residue the umber e i = y i y ˆi The residue of a poit M i is the positive if this poit is above the lie ad egative i the opposite situatio. x i Hece, we aim to fid the lie that «miimises to the best» the residues, the lie that passes through the cloud as close as possible to the poits. This way, we perform a liear fittig, or liear regressio. Oce doe, this object is called fittig lie, tred lie or regressio lie of the series. 3.3 Mayer s method Some residues are positive, some egative. Mayer's assumptio is that the "best" lie is the oe that leads to a zero sum of residues (the egative residues offset the positive oes). defiitio: we ame Mayer s priciple the goal mathematical aalysis: e = y ax b = y a x b ( ) i i i i i e i = 0 This sum is zero iff yi a xi b = 0 iff y ax b = 0 That is to say: to obtai a cacellatio of the sum of residues, it is ecessary ad sufficiet that the G x, y. This property is't sufficiet i itself to make a straight lie cotai the midpoit of the cloud, ( ) Mayer's lie uique, sice the oly obligatio is to ow oe give poit. There are a ifiite umber of straight lies makig a zero sum of residues! Mayer s method: * Divide the cloud ito two subclouds: Both subclouds must cotai the same umber of poits: / if is eve, or ()/ o oe side ad (-)/ o the other side if is odd. The abscissas x i the first subcloud must all be less tha the abscissas x i the secod oe; * Calculate the coordiates of G ad G, mea poits (midpoits) of both subclouds; * Determie (if asked) the expressio of the lie (G G ), Mayer s lie that will be chose; draw it ote: It s bee proved that the mea poit of the whole cloud, G, belogs to the lie (G G ) i ay case, ad the that the latter meets Mayer s priciple. IUT de Sait-Etiee Départemet TC J.F.Ferraris Math S StatVar Lessos Rev08 page 7 / 4

4 Liear fittig: least square method 4. Parameters of a bivariate series 4.. The mea of X or of Y are: x = x = 4.. r x i ad y = i xi ad y = k y j = i y j without cotigecy (data series i lists see p.3 examples ad ); j with cotigecy (frequecies gathered ito a crossed table p.3 ex 3). The special poit G( x, y ) is amed mea poit or midpoit of the cloud. The variace of X ad the oe of Y are easily accessible (maual calculatios) by Koeig s theorem: r x i V ( X ) = x ad V( Y ) V ( X ) r i xi = x ad V( Y ) r yi = y r i yi = y without cotigecy; with cotigecy. The stadard deviatios are still the square roots of the variaces. 4..3 We ame covariace of the pair (X,Y) the umber : Cov( X Y ), = ( x x )( y y ) This is a «commo variace» betwee both variables, which is ecessary to aalyze their correlatio. Koeig s theorem gives a easier way to calculate the covariace: Cov ( ) xi yi X, Y = x y (without cotigecy) ad Cov(, ) r i k ij i i j= X Y = x y i x y. (with) 4..4 Usig the calculator: The meas ad stadard deviatios are give directly, i Stat mode. Ufortuately, the calculator gives either the variaces or the covariace. IUT de Sait-Etiee Départemet TC J.F.Ferraris Math S StatVar Lessos Rev08 page 8 / 4

4. Least square method The idea of this method is to square each residue, the to add these squares, ad fially to say that the "best" lie is the oe that miimizes this sum (obtai the smallest possible sum, cosiderig the ifiite umber of possible lies). defiitio: We ame least square priciple the oe that cosists i fidig a lie leadig to ei is miimum withi the cloud (Gauss) mathematical aalysis: we set P( a, b) = ( y ax b) There are two differet ways to expad it: i i : bivariate polyomial. (, ) = (( ) ) = ( ) ( ) P a b y ax b b b y ax y ax () i i i i i i d degree triomial, with respect to b; (, ) = ( ( i ) i ) = i ( i i i ) ( i ) P a b y b ax a x a x y b x y b () d degree triomial with respect to a. I this cotext, we ca cotiue like this: * cosider a as a costat ad b as a variable. P(a,b) () is miimum whe its derivative (/b) is zero (its st coefficiet,, is o-egative), which leads to b = y ax * cosider this latest value of b, ad a as a variable. P(a,b) () is miimum whe its derivative (/a) is xi yi x. y ( X, Y ) zero, which leads to a Cov = = x ( X ) i x V Calculus amateurs may try to fid back these results! ote: such a value of b implies that the regressio lie ows the mea poit of the cloud, G. This method coducts to a uique lie ad so is mostly employed. least square method: * Calculate the coefficiets ( X, Y ) ( X ) Cov a = ad b = y ax (you ca get them o your calculator!) V * Write the expressio of the Y o X regressio lie D Y/X : y = ax b IUT de Sait-Etiee Départemet TC J.F.Ferraris Math S StatVar Lessos Rev08 page 9 / 4

4.3 Liear correlatio coefficiet A scatterplot shows a more or less strog lik betwee two variables X ad Y, sometimes displayig a elogated ad almost right cloud: i this case, a liear model is relevat. The purpose of the liear correlatio coefficiet is to evaluate the stregth of a liear lik, by a umber. liear correlatio coefficiet betwee X ad Y : (, ) ( X ) σ ( Y ) Cov X Y r = σ It s bee stated that, whatever the data series, - r (the capital R or the Greek letter ρ - «rhô», are sometimes used for this coefficiet) O the calculator: A calculator geerally writes it r if it metios it! (it depeds o the model). Therefore, we will calculate it by ourselves (which implies calculatig the covariace first...). Iterpretatio of its value: The strogest the liear correlatio is (cloud lookig like a straight lie), the closest to is r. "positive correlatio" : r is positive whe Y overall icreases with X "egative correlatio" : r is egative whe Y overall decreases as X icreases 0 r 0.5 : weak liear correlatio, iappropriate liear model. 0.5 r 0.75 : mea liear correlatio, o-appropriate liear model. 0.75 r 0.95 : tolerable liear correlatio, the liear model may ot be the best oe. 0.95 r : strog liear correlatio, the liear model is oe of the most appropriate. Commets: * are X ad Y really liked? If r is close to (or -), the poits are close to be colliear. Nevertheless, that does't always mea that X ad Y are cocretely related. E.g.: i Frace, from 974 to 98, the weddig rate decreased ad i the meatime the GDP (Frech : PIB) icreased, so that the scatter plot usig both data sets is quasi-liear (fourth graph below). The liear correlatio is mathematically very strog, but facts ad studies show there's o cause to effect relatioship betwee both variables! (after 98, the followig poits are ot at all colliear with the previous oes ay more). * liear correlatio r oly shows a liear lik. A correlatio betwee X ad Y may be very strog, but ot i a liear way (curved). I that case, r is far from ad -, ad the study has to be expaded (see IV). IUT de Sait-Etiee Départemet TC J.F.Ferraris Math S StatVar Lessos Rev08 page 0 / 4

Examples: icome ( ) vs. duratio i a compay success rate vs. % of disadvataged SPC r = 0.8449 duratio r = -0.7457 uit margi ( /u) vs. quatity weddig rate through time r = 0.6438 quatity (thousads) r = -0.9875 IUT de Sait-Etiee Départemet TC J.F.Ferraris Math S StatVar Lessos Rev08 page / 4

5 No-liear fittig: the variable chage A variable chage may be performed if the poits seem to follow a curve i particular. The fuctio to cosider will always be defied by the directios of a exercise. It may be: * a logarithm or expoetial fuctio * a polyomial fuctio * a trigoometric fuctio * Oe of the variables X or Y (or both!) has to be replaced each by a ew oe, oted T for istace, followig a give formula that allows its calculatio startig from the former. e.g.: X 3 5 8 Y 9 3 8 70 As Y seems to vary as X squared, plus 5, we ca defie the variable chage T = X ². We have to build the followig table, ito which T shall replace X : T 4 9 5 64 Y 9 3 8 70 * We perform a liear regressio of the pair (T, Y),observig their order. e.g.: Here, the questio is to determie the expressio of their fittig lie, y = at b. If we are told to use the least square method, the coefficiets a ad b will be give by the calculator: y =.056 t 3.856 * Fially, we ca deduce the expressio of a curve, fittig the o-liear relatioship betwee X ad Y, just by writig the variable chage agai; we may draw this curve, if we re told to. e.g.: Sice y =.056 t 3.856, we get: y =.056 x² 3.856 (expressio of a parabola) IUT de Sait-Etiee Départemet TC J.F.Ferraris Math S StatVar Lessos Rev08 page / 4

6 Statistical predictio 6. Poit estimate The fittig straight lie (obtaied with or without a variable chage) makes it possible, through its expressio, to estimate a value of the variable Y o choosig a uexplored value of the variable X (geerally greater tha those collected i the geuie series). I this case, if X represets time, it is possible to make a forecast to the future. e.g.: let s set the expressio of a fittig lie: y = 0.85x. a. Poit estimate of y with x 0 = 0. y 0 = 0.85 0 = 30.5. b. Poit estimate of x with y 0 = 39. x 0 = (39 )/0.85 = 0. 6. Cofidece iterval We ought to step back, cosiderig the poit estimate: accordig to the oise (dispersio) of the poit cloud, it is more or less trustable it gives us a more or less precise predictio. Here, the ew idea is to give a estimate by a rage (iterval), aroud the poit estimate, rather tha a sigle value, ad to be able to associate a probability (cofidece level) for the ukow reality to be iside such a rage. Rates method (uses a liear model, estimates y from x):. For each value x i of the iitial data set: * calculate the values y' i followig the expressio of the regressio lie * calculate the rates z i = y i / y' i * calculate the mea ad stadard deviatio of the variable Z. Z is cosidered as distributed by a ormal law. Cosequetly: z.96σ ; z.96σ 95 % of Z values take place iside the iterval [ Z Z ] 99 % of Z values take place iside the iterval [ z.58σ ; z.58σ ] 3. Calculate the poit estimate y' 0, associated to the ew give value x 0, thaks to the fittig lie. Now, we ca predict the uexplored possible values y 0 by a iterval, as follows: There are 95% chaces that y 0 would be i y 0 ( z.96σ Z ) ; y 0 ( z.96σ Z ) There are 99% chaces that y 0 would be i y 0 ( z.58σ Z ) ; y 0 ( z.58σ Z ) commets: * this method is efficiet oly for r > 0 (o-egative correlatio) * the probability (95%, 99%, etc.) is amed cofidece level of the predictio. Its complemet (5%, %, etc.) is amed sigificace level. * The size of such a iterval is related to the ucertaity of the aswer. It icreases whe:. the cofidece level icreases,. r decreases,. the distace betwee x 0 ad the abscissas x i of the poit cloud icreases. Z Z IUT de Sait-Etiee Départemet TC J.F.Ferraris Math S StatVar Lessos Rev08 page 3 / 4

IUT TC MATHEMATICS FORM FOR BIVARIATE STATISTICS IUT de Sait-Etiee Départemet TC J.F.Ferraris Math S StatVar Lessos Rev08 page 4 / 4