Chapter 1 ASPECTS OF MUTIVARIATE ANALYSIS

Similar documents
ME 410 MECHANICAL ENGINEERING SYSTEMS LABORATORY REGRESSION ANALYSIS

REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION

Chapter 9. Key Ideas Hypothesis Test (Two Populations)

x z Increasing the size of the sample increases the power (reduces the probability of a Type II error) when the significance level remains fixed.

COMPARISONS INVOLVING TWO SAMPLE MEANS. Two-tail tests have these types of hypotheses: H A : 1 2

Statistics and Chemical Measurements: Quantifying Uncertainty. Normal or Gaussian Distribution The Bell Curve

TESTS OF SIGNIFICANCE

Statistical Inference Procedures

20. CONFIDENCE INTERVALS FOR THE MEAN, UNKNOWN VARIANCE

Société de Calcul Mathématique, S. A. Algorithmes et Optimisation

STUDENT S t-distribution AND CONFIDENCE INTERVALS OF THE MEAN ( )

CE3502 Environmental Monitoring, Measurements, and Data Analysis (EMMA) Spring 2008 Final Review

Questions about the Assignment. Describing Data: Distributions and Relationships. Measures of Spread Standard Deviation. One Quantitative Variable

State space systems analysis

Tables and Formulas for Sullivan, Fundamentals of Statistics, 2e Pearson Education, Inc.

LECTURE 13 SIMULTANEOUS EQUATIONS

u t u 0 ( 7) Intuitively, the maximum principles can be explained by the following observation. Recall

M227 Chapter 9 Section 1 Testing Two Parameters: Means, Variances, Proportions

Fig. 1: Streamline coordinates

Comments on Discussion Sheet 18 and Worksheet 18 ( ) An Introduction to Hypothesis Testing

CHAPTER 6. Confidence Intervals. 6.1 (a) y = 1269; s = 145; n = 8. The standard error of the mean is = s n = = 51.3 ng/gm.

Bivariate Sample Statistics Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 7

SOLUTION: The 95% confidence interval for the population mean µ is x ± t 0.025; 49

10-716: Advanced Machine Learning Spring Lecture 13: March 5

IntroEcono. Discrete RV. Continuous RV s

Applied Mathematical Sciences, Vol. 9, 2015, no. 3, HIKARI Ltd,

ELEC 372 LECTURE NOTES, WEEK 4 Dr. Amir G. Aghdam Concordia University

Heat Equation: Maximum Principles

STA 4032 Final Exam Formula Sheet

STRONG DEVIATION THEOREMS FOR THE SEQUENCE OF CONTINUOUS RANDOM VARIABLES AND THE APPROACH OF LAPLACE TRANSFORM

8.6 Order-Recursive LS s[n]

Erick L. Oberstar Fall 2001 Project: Sidelobe Canceller & GSC 1. Advanced Digital Signal Processing Sidelobe Canceller (Beam Former)

Tools Hypothesis Tests

Chapter 9: Hypothesis Testing

11 Correlation and Regression

Formula Sheet. December 8, 2011

VIII. Interval Estimation A. A Few Important Definitions (Including Some Reminders)

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Statistical Equations

On the Multivariate Analysis of the level of Use of Modern Methods of Family Planning between Northern and Southern Nigeria

MTH 212 Formulas page 1 out of 7. Sample variance: s = Sample standard deviation: s = s

Assignment 1 - Solutions. ECSE 420 Parallel Computing Fall November 2, 2014

ON THE SCALE PARAMETER OF EXPONENTIAL DISTRIBUTION

Generalized Likelihood Functions and Random Measures

Statistics Problem Set - modified July 25, _. d Q w. i n

Last time: Completed solution to the optimum linear filter in real-time operation

11/19/ Chapter 10 Overview. Chapter 10: Two-Sample Inference. + The Big Picture : Inference for Mean Difference Dependent Samples

Chapter 2 Descriptive Statistics

STP 226 ELEMENTARY STATISTICS

MATHEMATICS LW Quantitative Methods II Martin Huard Friday April 26, 2013 TEST # 4 SOLUTIONS

UNIVERSITY OF CALICUT

Correlation and Covariance

Reasons for Sampling. Forest Sampling. Scales of Measurement. Scales of Measurement. Sampling Error. Sampling - General Approach

Chapter 1 Econometrics

Number of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day

Brief Review of Linear System Theory

Statistical Inference for Two Samples. Applied Statistics and Probability for Engineers. Chapter 10 Statistical Inference for Two Samples

Chapter 8.2. Interval Estimation

multiplies all measures of center and the standard deviation and range by k, while the variance is multiplied by k 2.

m = Statistical Inference Estimators Sampling Distribution of Mean (Parameters) Sampling Distribution s = Sampling Distribution & Confidence Interval

Grant MacEwan University STAT 151 Formula Sheet Final Exam Dr. Karen Buro

Performance-Based Plastic Design (PBPD) Procedure

Statistical treatment of test results

THE ADAPTIVE LASSSO UNDER A GENERALIZED SPARSITY CONDITION. Joel L. Horowitz Department of Economics Northwestern University Evanston, IL

S T A T R a c h e l L. W e b b, P o r t l a n d S t a t e U n i v e r s i t y P a g e 1. = Population Variance

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

1 Inferential Methods for Correlation and Regression Analysis

Chapter Vectors

Hidden Markov Model Parameters

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

In Lecture 25, the noisy edge data was modeled as a series of connected straight lines plus iid N(0,

Difference tests (1): parametric

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

We will look for series solutions to (1) around (at most) regular singular points, which without

Statistical Properties of OLS estimators

A criterion for easiness of certain SAT-problems

EULER-MACLAURIN SUM FORMULA AND ITS GENERALIZATIONS AND APPLICATIONS

Confidence Intervals. Confidence Intervals

Machine Learning for Data Science (CS 4786)

TI-83/84 Calculator Instructions for Math Elementary Statistics

Isolated Word Recogniser

Chem Exam 1-9/14/16. Frequency. Grade Average = 72, Median = 72, s = 20

Regression, Inference, and Model Building

orig For example, if we dilute ml of the M stock solution to ml, C new is M and the relative uncertainty in C new is

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

UNIVERSITY OF NORTHERN COLORADO MATHEMATICS CONTEST. First Round For all Colorado Students Grades 7-12 November 3, 2007

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

Section 5.1 The Basics of Counting

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

100(1 α)% confidence interval: ( x z ( sample size needed to construct a 100(1 α)% confidence interval with a margin of error of w:

MEASURES OF DISPERSION (VARIABILITY)

JOURNAL OF THE INDIAN SOCIETY OF AGRICULTURAL STATISTICS

On the 2-Domination Number of Complete Grid Graphs

Elementary Statistics

Mathacle PSet Stats, Confidence Intervals and Estimation Level Number Name: Date: Unbiased Estimators So we don t have favorite.

ANALYTICAL SOLUTIONS FOR WELL DRAWDOWN WITH WELL LOSSES 2. REAL WELL NEAR BOUNDARY - SOLUTION BY IMAGE WELL

Section II. Free-Response Questions -46-

Confidence Intervals: Three Views Class 23, Jeremy Orloff and Jonathan Bloom

Math 155 (Lecture 3)

Transcription:

Chapter ASPECTS OF MUTIVARIATE ANALYSIS. Itroductio Defiitio Wiipedia: Multivariate aalyi MVA i baed o the tatitical priciple of multivariate tatitic which ivolve obervatio ad aalyi of more tha oe tatitical variable at a time. The objective of cietific ivetigatio to which multivariate method mot aturally led themelve iclude the followig.. Data reductio or tructural implificatio. Sortig ad groupig 3. Ivetigatio of the depedece amog variable 4. Predictio 5. Hypothei cotructio ad Tetig Eample: I the real world mot data collectio cheme or deiged eperimet that provide data are multivariate i ature. Some eample of uch ituatio are give below. Durig a urvey of houehold everal meauremet o each houehold are tae. Thee meauremet beig tae o the ame houehold will be depedet. For eample the educatio level of the head of the houehold ad the aual icome of the family are related. Durig a productio proce a umber of differet meauremet uch a the teile tregth brittlee diameter etc. are tae o the ame uit. Collectively uch data are viewed a multivariate data. P a g e

Price of a car deped o everal factor ay year mileage warraty HP model amog may. Here year mileage warraty are correlated. Body fite deped o age height weight amout of eercie food habit etc. Here height ad weight are related. A ew drug i to be compared with a cotrol for it effectivee. Two differet group of patiet are aiged to each of the two treatmet ad they are oberved weely for et two moth. The periodic meauremet o the ame patiet will ehibit depedece ad thu the baic problem i multivariate i ature.. Applicatio of Multivariate Techique Some applicatio amog may are decribig below:. Data reductio or tructural implificatio. Sortig ad groupig 3. Ivetigatio of the depedece amog variable 4. Predictio 5. Hypothei cotructio ad Tetig Read page 3 ad 4 for applicatio for each of the above categorie. P a g e

. The Orgaizatio of Data Array Multivariate data arie wheever a ivetigator eeig to udertad a ocial or phyical pheomeo elect a umber p of variable or character to record. The value of thee variable are all recorded for each ditict item idividual or eperimetal uit. We will ue otatio j to idicate the particular value of the th variable that i oberved o the jth item or trial. That i j =meauremet of the th variable o the jth item Now meauremet o p variable ca be diplayed a follow Variable Variable. Variable Variable p Item. ip Item. p. Item j j j. j jp Item p Thee data ca be diplayed a a rectagular array Called X of row ad p colum. The array X cotai all of the obervatio o all of the variable. X......p......p... j j... j... jp......... p 3 P a g e

Eample. Page 6: Number of boo ad dollar ale A electio of four receipt from a uiverity bootore wa obtaied i order to ivetigate the ature of boo ale. Each receipt provided amog other thig the total amout of each ale ad the umber of boo old. The data are give below: Variable dollar ale 4 5 48 58 Variable # of boo 4 5 4 3 The the data array X i with 4 row ad colum X 4 5 48 58 4 5 4 3 Here =4 =5 4 =3 Decriptive Statitic Decriptive tatitic decribe the data. For eample mea variace tadard deviatio correlatio ewe ad urtoi are decriptive tatitic. We will dicu motly dicu decriptive tatitic that meaure locatio variatio ad liear aociatio. The formal defiitio of thee quatitie are give below. Let be meauremet o variable. The the ample mea of thee meauremet i j j 4 P a g e

5 P a g e The ecod ample mea: j j The p ample mea:...p j j The ample variace which meaure the variability of the data alo called diperio OR pread of meauremet for variable i j j The ample variace of meauremet for p variable...p j j Note that ad the quare root of the ample variace i ow a the ample tadard deviatio SD. Note: Motly we will be ued SD to meaure the variability a it ha the ame uit of meauremet lie a mea or media. Sample Covariace: Coider pair of meauremet o each of variable &

A meaure of liear aociatio betwee the meauremet of variable ad i provided by the ample covariace. The ample covariace betwee variable ad i deoted by ad defied a j j j The ample covariace betwee ith ad th variable i deoted by i ad defied a i j ji i j ; i..p...p Thi i the average product of the deviatio from their repective mea. Sample correlatio coefficiet alo ow a Pearo product correlatio coefficiet The ample correlatio coefficiet betwee ith ad th variable i deoted by r i ad defied a r i ii i i i j j ji ji i i j j j ; i..p The ample correlatio coefficiet r ha the followig propertie:. The value of r lie betwee - ad + icluive.. r meaure the tregth of liear aociatio. Thu r=0 implie lac of liear aociatio betwee two variable. 6 P a g e

3. r=± a perfect liear aociatio. 4. r> 0 implie a tedecy for oe value of the pair to be large whe other value i large ad alo both value to be mall together. 5. r < 0 implie a tedecy for oe value i the pair to be large tha it average whe other value i maller tha it average 6. The value of r i remai uchaged if the meauremet of the ith variable are chaged to variable chaged to y j y ji = a ji + b ad the value of the th = c j + b provided that the cotat a ad c have ame ig. That mea r i ivariat i both locatio ad cale of meauremet. Array of Baic Decriptive Statitic The decriptive tatitic computed from meauremet o p variable ca be orgaized ito array. Sample mea X p Sample variace ad covariace S p p p p pp 7 P a g e

Sample correlatio R r r p r r p r r p p Eample. page 0 The array X S ad R for bivariate data i Eample. are give below X 50 4 S 34 -.5 -.5 0.5 R - 0.36-0.36 r =-0.36 wea egative liear relatiohip betwee two variable X ad X. Graphical Techique: Scatter Plot: Uig SPSS we obtai the followig catter plot betwee variable &. Variable : 3 4 6 8 5 Variable : 5 5.5 4 7 0 5 7.5 8 P a g e

Figure.: A catter plot betwee variable ad Uig SPSS we obtai r =0.96. Strog correlatio betwee variable &. The catter diagram Figure. gave the ame impreio about the trog liear relatiohip betwee variable &. Eample.4 A catter plot for baeball data Table.: 977 Salary ad Fial Record for the Natioal League Eat Team Player payroll Wo-lot percetage Philadelphia 3497900 0.6 Pittburg 485475 0.59 St. Loui 78875 0.5 Chicago 75450 0.5 Motreal 645575 0.46 New Yor 469800 0.4 9 P a g e

The catter plot uig SPSS Figure.4: Salarie ad wo-lot percetage Table - page 4 Eample.6 page 7: A zoologit obtaied meauremet o =5 lizard. The weight or ma i give i gram while the out-vet legth SVL ad hid limb pa HLS are give i millimeter. The data are diplayed i Table.3. Table.3: Lizard Size Data Lizard Ma SVL HLS 5.5 59 3.5 0.4 75 4 3 9. 69 4 4 9 67.5 5 5 7. 6 9.5 6 6.6 6 3 7.3 74 40 8.4 47 97 9 5.5 86.5 6 0 9 69 6.5 8. 70.5 36 6.6 64.5 6 3 7.6 67.5 35 0 P a g e

4 0. 73 36.5 5 0 73 35.5 6 0. 77 39 7 7.6 6.5 8 8 7.73 66.5 33.5 9 79.5 50 0 0 74 37 5. 59.5 6 9. 68 3 3. 75 4 4 7 66.5 7 4 6.9 63 7 Uig Miitab 3D Scatterplot of Ma v HLS v SVL 5 Ma 0 5 50 60 40 0 HLS 60 70 00 80 SVL P a g e

From Figure.6 ad.7 we ca ee that mot of the variatio i catter about a oe-dimeioal traight lie..4 Data Diplay ad Pictorial Repreetatio Coider data i Eample.6 ad do a matri plot which i a liig multiple two-dimeioal plot. Correlatio Matri Ma Pearo Correlatio SVL HLS Correlatio Ma SVL HLS.96 **.96 ** Sig. -tailed.000.000 N 5 5 5 Pearo.96 **.938 ** Correlatio Sig. -tailed.000.000 N 5 5 5 Pearo.96 **.938 ** Correlatio Sig. -tailed.000.000 N 5 5 5 *. Correlatio i igificat at the 0.0 level -tailed. P a g e

.5 Ditace If we coider the poit P= i the plae the traight lie ditace do P from P to the origi O=00 i accordig to the Pythagorea theorem i give by d O P The ituatio i illutrated i Figure.9 page 30. 3 P a g e

4 P a g e I geeral if the poit P ha p coordiate o that P=... p the traight lie ditace from P to origi O=00..0 i P O d p ' All poit p that lie a cotat quared ditace uch a c from the origi atify the followig equatio c O P d p The traight lie ditace betwee two arbitrary poit P ad Q with coordiate P=.. p ad Q=y y...y p i give by y y y y Q P d p p Stadardized Ditace: * * P O d -3 Uig -3 we ee that all poit which have coordiate ad are a cotat quared ditace c from the origi mut atify c -4 Equatio -4 i the equatio of a ellipe cetered at the origi whoe major ad mior ae coicide with the coordiate ae. Thi geeral cae i how i Figure. page 3.

Eample.4 page 3: Calculatig a tatitical ditace d O P 4 All poit that are a cotat ditace from the origi atify the equatio 4 The coordiate of ome poit a uit ditace from the origi are preeted i the followig Table Coordiate: 0 0-0 3 / - 0 4 0 4 0 4 Ditace: 0 4 3 / 4 0 4 5 P a g e

A plot of the equatio 4 i give below The epreio i -3 ca be geeralized to accommodate the calculatio of tatitical ditace from a arbitrary poit P= to ay fied poit Q=y y. If we aume that the coordiate variable vary idepedetly o oe aother the ditace from P to Q i give by d P Q y y Let the poit P ad Q have p coordiated uch that P=... p ad Q=y y y p. Suppoe Q i a fied poit [it could be O=0 0 0] ad the coordiate variable vary idepedetly of oe aother. Let. pp be ample variace cotructed from meauremet o p repectively. The the tatitical ditace from P to Q i y y p y p d P Q pp 6 P a g e