Associations between variables I: Correlation
|
|
- Nigel Parsons
- 6 years ago
- Views:
Transcription
1 Associations between variables I: Correlation Part 2 will describe linear regression.
2 Variable Associations Previously, we asked questions about whether samples might be from populations with the same mean for a specific variable. Now, we are interested in relationships among variables within a sample.
3 Data Structure variables linked to same individual > X = read.table("army.rdata") > X SEX AGE RACE HEAD_LENGTH HEAD_BREADTH EAR_LENGTH
4 Why might variable values be coupled? LENGTH genes Leg Length Arm Length
5 Why might variable values be coupled? LENGTH genes Developmental nutrition Leg Length Arm Length
6 Independence Almost everything one can measure on a subject has some degree of association, but if there is none, the values are said to be independent. E.g., Handedness (R/L) and Ear Length (I have no idea if these are independent, but seems reasonable)
7 Independence : examples How many hours of sleep versus outside humidity Scholarly aptitude versus bone density
8 Dependence : examples Average number of cigarette packs a day versus number of years lived (for smokers) Class attendance versus amount of rain...
9 Correlation Correlation is the measure of linear dependence between two variables as one increases, the other is observed to increase (or decrease), as well. Pearson's Product-Moment Correlation is probably the most familiar measure of this association. PPMC, r, measures the strength of linear dependence between two variables.
10 Correlation A lack of independence means there is some association (or relationship) between values for different variables. One type of such an association is called correlation which is measured by a number of different statistical measures. NOTE! There are forms of association other than that measured by a particular correlation coefficient Independent does not imply non-correlated Non-correlated does not imply independent
11 Example y1 = x x in range [-1 to 1] y2 = x*x y1 and y2 are not correlated y1 and y2 are clearly dependent!!! If I know y1, I can calculate y2 with 100% certainty
12 Correlation/Causation Higher scores are correlated with higher fluency of FCAT passages Higher fluency is not necessarily the cause of higher scores, although it is certainly possible Can we quantify this correlation?
13 Correlation/Causation dfr$score > cor(dfr$score, dfr$fluency) cor(x, y) = cov(x, y) sd(x)sd(y) dfr$fluency
14 Variance/Covariance cov(x, y) = E((x µ)(y µ)) var(x, x) = E((x µ) 2 )= 2 The above formulas apply to populations
15 Correlation between two random variables cor(x, y) = E((x µ)(y µ)) (x) (y) If x and y are identical, cor(x, x) = E((x µ)2 ) (x) 2 =1 E(x) is calculated by choosing n values of x from the population, computing the average, and letting n go to (infinity)
16 Correlation between two samples s1 and s2 Correlation: cor(s1,s2) Covariance: cov(s1,s2) N1 = (s1-mean(s1)) / sd(s1) N2 = (s2-mean(s2)) / sd(s2) correlation: r = cov(n1,n2) N1 and N2 are normalized to have zero mean and standard deviation of 1
17 Independence vs correlation X = N(0,1) Y = X*X X and Y are clearly not independent since given X, the value of Y is known exactly y Yet, mathematically, X and Y have zero correlation > x = rnorm(1000) > y = x*x > cor(x,y) [1] x
18 r If two values are perfectly, linearly correlated, then r = 1.0. If two values are perfectly, negatively, linearly correlated, r = If two variables are independent (no relationship), r = 0.0 Remember, though, uncorrelated is not the same as independent. Two values can be dependent, yet have zero correlation!!!
19 r
20 Y1, Y2 axes are not normalized
21 Y1-mean(y1), Y2 y1bar=mean(y1)
22 Y1-Y1bar, Y2-Y2bar
23 (Y1-Y1bar)/sd(Y1), (Y2-Y2bar)/sd(Y2) variables are now normalized
24 Y1,Y2 (Y1-Y1bar)/sd(Y1), (Y2-Y2bar)/sd(Y2) Axis ranges are not consistent Axis ranges are consistent
25 ?cor.test cor.test {stats} Test for Association/Correlation Between Paired Samples Description Test for association between paired samples, using one of Pearson's product moment correlation coefficient, Kendall's tau or Spearman's rho. Usage cor.test(x,...) ## Default S3 method: cor.test(x, y, alternative = c("two.sided", "less", "greater"), method = c("pearson", "kendall", "spearman"), exact = NULL, conf.level = 0.95, continuity = FALSE,...) Test for association between paired samples, using one of Pearson's product moment correlation coefficient, Kendall's tau or Spearman's rho. ## S3 method for class 'formula' cor.test(formula, data, subset, na.action,...) Arguments x, y numeric vectors of data values. x and y must have the same length.... Details... Value...
26 cor.test(y1,y2) > cor.test(y1,y2) Pearson's product-moment correlation data: y1 and y2 t = , df = 98, p-value = 2.247e-13 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: sample estimates: cor The correlation is significantly different from zero. There is sufficient evidence to accept the alternative hypothesis
27 > r = 0.3 > y1 = rnorm(100) > y.temp = rnorm(100) > y2 = (y1*r) + y.temp*(sqrt(1-r*r)) > cor.test(y1,y2) Play Time... Pearson's product-moment correlation data: y1 and y2 t = , df = 98, p-value = 4.366e-05 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: sample estimates: cor Generate your own pair of variables with a given correlation
28 Play Time... > r = 0.3 > y1 = rnorm(100) > y.temp = rnorm(100) > y2 = (y1*r) + y.temp*(sqrt(1-r*r)) Generate and plot and compute r,... y1,y2 ~ N(0,1); r=1.0, -1.0, 0, 0.3, -0.7 y1,y2 ~ N(0,0.2),N(0,2.3); r=1.0, 0, -0.7 y1,y2 ~ N(1,0.7),N(100,12.5); r=0.3 y1', y2' subtract off means and divide by sd
29 User R functions I wish to repeat a sequence of commands many times with different parameters For example: generate n random numbers plot a histogram of these 100 random numbers (create 6 plots per page) Let n take the values 100, 500, 5000, 10000
30 Method 1 par(mfrow=c(2,3)) n=100 hist(rnorm(n)) n=500 hist(rnorm(n)) n=5000 hist(rnorm(n))... Simply type everything out Not difficult since very few commands n = 100 rand = rnorm(n) hist(rand)
31 Method 2: scripts Put the following command in a script called histo.r plot(hist(rnorm(n)) par(mfrow=c(2,3)) n=100 source( histo.r ) n=500 source( histo.r )... Note that running the script histo.r without a value for n will give and error (undefined variable)
32 Method 3: functions In a script called histof.r, add the following: histof = function(n=100) { hist(rnorm(n)) } In R, > par(mfrow=c(2,3) > source( histof.r ) > histof() > histof(500) > histof(n=5000) >... Most efficient and easy to use
33 Use Apply to run all cases (script: histof.r) histof = function(n=100) { title = paste("normal histogram: ", n, " points") hist(rnorm(n), main=title) } # par(mfrow=c(2,3)) par(ask=t) histof() histof(n=500) histof(n=2000) histof(n=5000) # par(mfrow=c(2,3)) apply(as.matrix(c(100,200,300,400,500,5000)), 1, histof) histof(n) is a user-defined function Use the function like any R function Use apply() to simplify your code
34 For loops histof = function(n=100) { title = paste("normal histogram: ", n, " points") hist(rnorm(n), main=title) } # par(mfrow=c(2,3)) par(ask=t) counts = c(100,200,300,400,500,5000) for (i in counts) { histof(i) }
35 Return to: Play Time... > r = 0.3 > y1 = rnorm(100) > y.temp = rnorm(100) > y2 = (y1*r) + y.temp*(sqrt(1-r*r)) Generate and plot and compute r,... y1,y2 ~ N(0,1); r=1.0, -1.0, 0, 0.3, -0.7 y1,y2 ~ N(0,0.2),N(0,2.3); r=1.0, 0, -0.7 y1,y2 ~ N(1,0.7),N(100,12.5); r=0.3 y1', y2' subtract off means and divide by sd
36 Functions Create a function to store > r = 0.3 > y1 = rnorm(100) > y.temp = rnorm(100) > y2 = (y1*r) + y.temp*(sqrt(1-r*r))
37 play = function(r=r, n=100, mean=0, sd=1) { y1 = rnorm(n, mean=mean, sd=sd) y.temp = rnorm(n, mean=mean, sd=sd) y2 = (y1*r) + y.temp*(sqrt(1-r*r)) } correlation = cor(y1,y2) cat("r=",r," correlation= ", correlation, "\n") return(correlation)
38 Using the function play(...) print(" ") print(" r = c(.1,.5,.8) ") corre = as.matrix(c(.1,.5,.8)) apply(corre, 1, play, 100, 0, 1) print(" ") print(" r = seq(-1,1,.1) ") function arguments: r = corre (a matrix column or row) n= 100 mean= 0 sd=1 corre = as.matrix(seq(-1,1,.1)) apply(corre, 1, play, 100, 0, 1)
39 Criminals What is the correlation between middle finger length and height? Is it statistically significant? Are the mean finger length and mean height significantly different? (t.test)
40 Criminals Standardize (y->y') (subtract mean/divide by variance) Plot Compute r directly (manual calculation) Compare cor()
41 Criminals > X = read.table("criminal_cambridge.rdata") > dim(x) [1] > X1 = subset(x,source=="criminal") > dim(x1) [1] > head(x1,3) source height.cm middle.finger.cm 1 criminal criminal criminal Is there a correlation between h and mfl???
42 Criminals Extract h and mfl for criminals (see above) Plot h vs. mfl Compute r for (h and mfl) and (mfl and h) Test r for h, mfl Note, plot(x) where x=full data set cor(h,l) v. cor(y.hl) cov()
43 Assumptions Significance testing of r assumes that the sample pairs are independent and identically distributed and follow a bivariate normal distribution. What if they are not? Transformations? Non-parametric tests usually based on ranks Randomization test
44 Ranks A rank is the position of an observation in a list of observations sorted by magnitude. E.g., see?rank?sort > y1 = rnorm(5) > y2 = rnorm(5) > y1 [1] > y2 [1] > rank(y1) [1] > rank(y2) [1] * Ties are require special handling.
45 Spearman's Correlation Spearman's rank correlation coefficient or Spearman's rho,...is a non-parametric measure of statistical dependence between two variables. It is computed as the product-moment correlation of the ranks of the two variables.
46 Spearman's rho > y1 = rnorm(5); y2 = rnorm(5) > y1 [1] > y2 [1] > rank(y1) [1] > rank(y2) [1] > cor(y1,y2) [1] > cor(y1,y2,method="sp") # Spearman Correlation (rank-based) [1] -0.1 > cor(rank(y1),rank(y2)) [1] -0.1
47 Kendall's tau Kendall rank correlation coefficient, commonly referred to as Kendall's tau (τ) coefficient, is a statistic used to measure the association between two measured quantities. A tau test is a nonparametric hypothesis test for statistical dependence based on the tau coefficient. Specifically, it is a measure of rank correlation, i.e., the similarity of the orderings of the data when ranked by each of the quantities.
48 Kendall's tau (part one) (part two) Comparison between Tau and Spearman
49 Both Spearman and Tau methods can be used for two variables?cor cor(x, y = NULL, use = "everything", method = c("pearson", "kendall", "spearman"))
Measuring relationships among multiple responses
Measuring relationships among multiple responses Linear association (correlation, relatedness, shared information) between pair-wise responses is an important property used in almost all multivariate analyses.
More informationChapter 13 Correlation
Chapter Correlation Page. Pearson correlation coefficient -. Inferential tests on correlation coefficients -9. Correlational assumptions -. on-parametric measures of correlation -5 5. correlational example
More informationBivariate Paired Numerical Data
Bivariate Paired Numerical Data Pearson s correlation, Spearman s ρ and Kendall s τ, tests of independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More informationHypothesis Testing. Gordon Erlebacher. Thursday, February 14, 13
Hypothesis Testing Gordon Erlebacher What we have done R basics: - vectors, data frames, - factors, extraction, - logical expressions, scripts, read and writing data files - histograms, plotting Functions
More informationCorrelation. January 11, 2018
Correlation January 11, 2018 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order
More informationBIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression
BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested
More informationNemours Biomedical Research Biostatistics Core Statistics Course Session 4. Li Xie March 4, 2015
Nemours Biomedical Research Biostatistics Core Statistics Course Session 4 Li Xie March 4, 2015 Outline Recap: Pairwise analysis with example of twosample unpaired t-test Today: More on t-tests; Introduction
More informationStatistics. Introduction to R for Public Health Researchers. Processing math: 100%
Statistics Introduction to R for Public Health Researchers Statistics Now we are going to cover how to perform a variety of basic statistical tests in R. Correlation T-tests/Rank-sum tests Linear Regression
More informationChapter 12 - Lecture 2 Inferences about regression coefficient
Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous
More informationPackage jmuoutlier. February 17, 2017
Type Package Package jmuoutlier February 17, 2017 Title Permutation Tests for Nonparametric Statistics Version 1.3 Date 2017-02-17 Author Steven T. Garren [aut, cre] Maintainer Steven T. Garren
More informationFirst steps of multivariate data analysis
First steps of multivariate data analysis November 28, 2016 Let s Have Some Coffee We reproduce the coffee example from Carmona, page 60 ff. This vignette is the first excursion away from univariate data.
More informationSTAT 135 Lab 8 Hypothesis Testing Review, Mann-Whitney Test by Normal Approximation, and Wilcoxon Signed Rank Test.
STAT 135 Lab 8 Hypothesis Testing Review, Mann-Whitney Test by Normal Approximation, and Wilcoxon Signed Rank Test. Rebecca Barter March 30, 2015 Mann-Whitney Test Mann-Whitney Test Recall that the Mann-Whitney
More informationN Utilization of Nursing Research in Advanced Practice, Summer 2008
University of Michigan Deep Blue deepblue.lib.umich.edu 2008-07 536 - Utilization of ursing Research in Advanced Practice, Summer 2008 Tzeng, Huey-Ming Tzeng, H. (2008, ctober 1). Utilization of ursing
More informationReadings Howitt & Cramer (2014) Overview
Readings Howitt & Cramer (4) Ch 7: Relationships between two or more variables: Diagrams and tables Ch 8: Correlation coefficients: Pearson correlation and Spearman s rho Ch : Statistical significance
More informationCorrelation and the Analysis of Variance Approach to Simple Linear Regression
Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation
More informationNonparametric Independence Tests
Nonparametric Independence Tests Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Nonparametric
More informationReadings Howitt & Cramer (2014)
Readings Howitt & Cramer (014) Ch 7: Relationships between two or more variables: Diagrams and tables Ch 8: Correlation coefficients: Pearson correlation and Spearman s rho Ch 11: Statistical significance
More informationData files for today. CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav
Correlation Data files for today CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav Defining Correlation Co-variation or co-relation between two variables These variables change together
More informationChapter 11. Correlation and Regression
Chapter 11. Correlation and Regression The word correlation is used in everyday life to denote some form of association. We might say that we have noticed a correlation between foggy days and attacks of
More informationInter-Rater Agreement
Engineering Statistics (EGC 630) Dec., 008 http://core.ecu.edu/psyc/wuenschk/spss.htm Degree of agreement/disagreement among raters Inter-Rater Agreement Psychologists commonly measure various characteristics
More informationNon-parametric (Distribution-free) approaches p188 CN
Week 1: Introduction to some nonparametric and computer intensive (re-sampling) approaches: the sign test, Wilcoxon tests and multi-sample extensions, Spearman s rank correlation; the Bootstrap. (ch14
More informationCorrelation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?
Comment: notes are adapted from BIOL 214/312. I. Correlation. Correlation A) Correlation is used when we want to examine the relationship of two continuous variables. We are not interested in prediction.
More informationCorrelation and Regression
Correlation and Regression http://xkcd.com/552/ Review Testing Hypotheses with P-Values Writing Functions Z, T, and χ 2 tests for hypothesis testing Power of different statistical tests using simulation
More informationQuantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression
Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Correlation Linear correlation and linear regression are often confused, mostly
More informationCorrelation & Linear Regression. Slides adopted fromthe Internet
Correlation & Linear Regression Slides adopted fromthe Internet Roadmap Linear Correlation Spearman s rho correlation Kendall s tau correlation Linear regression Linear correlation Recall: Covariance n
More informationSTAT 4385 Topic 03: Simple Linear Regression
STAT 4385 Topic 03: Simple Linear Regression Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2017 Outline The Set-Up Exploratory Data Analysis
More informationChapter 8: Correlation & Regression
Chapter 8: Correlation & Regression We can think of ANOVA and the two-sample t-test as applicable to situations where there is a response variable which is quantitative, and another variable that indicates
More informationCorrelation: Relationships between Variables
Correlation Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means However, researchers are
More informationIn many situations, there is a non-parametric test that corresponds to the standard test, as described below:
There are many standard tests like the t-tests and analyses of variance that are commonly used. They rest on assumptions like normality, which can be hard to assess: for example, if you have small samples,
More informationE509A: Principle of Biostatistics. (Week 11(2): Introduction to non-parametric. methods ) GY Zou.
E509A: Principle of Biostatistics (Week 11(2): Introduction to non-parametric methods ) GY Zou gzou@robarts.ca Sign test for two dependent samples Ex 12.1 subj 1 2 3 4 5 6 7 8 9 10 baseline 166 135 189
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationBivariate statistics: correlation
Research Methods for Political Science Bivariate statistics: correlation Dr. Thomas Chadefaux Assistant Professor in Political Science Thomas.chadefaux@tcd.ie 1 Bivariate relationships: interval-ratio
More informationNonparametric Methods
Nonparametric Methods Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Nonparametric Methods, or Distribution Free Methods is for testing from a population without knowing anything about the
More informationJoint Probability Distributions, Correlations
Joint Probability Distributions, Correlations What we learned so far Events: Working with events as sets: union, intersection, etc. Some events are simple: Head vs Tails, Cancer vs Healthy Some are more
More informationPackage sscor. January 28, 2016
Type Package Package sscor January 28, 2016 Title Robust Correlation Estimation and Testing Based on Spatial Signs Version 0.2 Date 2016-01-19 Depends pcapp, robustbase, mvtnorm Provides the spatial sign
More informationChapter 11: Linear Regression and Correla4on. Correla4on
Chapter 11: Linear Regression and Correla4on Regression analysis is a sta3s3cal tool that u3lizes the rela3on between two or more quan3ta3ve variables so that one variable can be predicted from the other,
More informationDependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.
MFM Practitioner Module: Risk & Asset Allocation September 11, 2013 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y
More informationCorrelation and Regression
Correlation and Regression 1 Overview Introduction Scatter Plots Correlation Regression Coefficient of Determination 2 Objectives of the topic 1. Draw a scatter plot for a set of ordered pairs. 2. Compute
More informationPackage leiv. R topics documented: February 20, Version Type Package
Version 2.0-7 Type Package Package leiv February 20, 2015 Title Bivariate Linear Errors-In-Variables Estimation Date 2015-01-11 Maintainer David Leonard Depends R (>= 2.9.0)
More informationMidterm 2 - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put
More informationBIOL 4605/7220 CH 20.1 Correlation
BIOL 4605/70 CH 0. Correlation GPT Lectures Cailin Xu November 9, 0 GLM: correlation Regression ANOVA Only one dependent variable GLM ANCOVA Multivariate analysis Multiple dependent variables (Correlation)
More informationNonparametric hypothesis tests and permutation tests
Nonparametric hypothesis tests and permutation tests 1.7 & 2.3. Probability Generating Functions 3.8.3. Wilcoxon Signed Rank Test 3.8.2. Mann-Whitney Test Prof. Tesler Math 283 Fall 2018 Prof. Tesler Wilcoxon
More informationChapter 23: Inferences About Means
Chapter 3: Inferences About Means Sample of Means: number of observations in one sample the population mean (theoretical mean) sample mean (observed mean) is the theoretical standard deviation of the population
More informationAnalysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.
Analysis of Covariance In some experiments, the experimental units (subjects) are nonhomogeneous or there is variation in the experimental conditions that are not due to the treatments. For example, a
More informationRandom vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.
Random vectors Recall that a random vector X = X X 2 is made up of, say, k random variables X k A random vector has a joint distribution, eg a density f(x), that gives probabilities P(X A) = f(x)dx Just
More informationTextbook Examples of. SPSS Procedure
Textbook s of IBM SPSS Procedures Each SPSS procedure listed below has its own section in the textbook. These sections include a purpose statement that describes the statistical test, identification of
More informationSPSS LAB FILE 1
SPSS LAB FILE www.mcdtu.wordpress.com 1 www.mcdtu.wordpress.com 2 www.mcdtu.wordpress.com 3 OBJECTIVE 1: Transporation of Data Set to SPSS Editor INPUTS: Files: group1.xlsx, group1.txt PROCEDURE FOLLOWED:
More information1 A Review of Correlation and Regression
1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then
More informationRank-Based Methods. Lukas Meier
Rank-Based Methods Lukas Meier 20.01.2014 Introduction Up to now we basically always used a parametric family, like the normal distribution N (µ, σ 2 ) for modeling random data. Based on observed data
More informationPassing-Bablok Regression for Method Comparison
Chapter 313 Passing-Bablok Regression for Method Comparison Introduction Passing-Bablok regression for method comparison is a robust, nonparametric method for fitting a straight line to two-dimensional
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationMidterm 2 - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis February 24, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put
More informationAdvanced Statistical Regression Analysis: Mid-Term Exam Chapters 1-5
Advanced Statistical Regression Analysis: Mid-Term Exam Chapters 1-5 Instructions: Read each question carefully before determining the best answer. Show all work; supporting computer code and output must
More informationMULTIPLE LINEAR REGRESSION IN MINITAB
MULTIPLE LINEAR REGRESSION IN MINITAB This document shows a complicated Minitab multiple regression. It includes descriptions of the Minitab commands, and the Minitab output is heavily annotated. Comments
More information1 Introduction to Minitab
1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you
More informationTable of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).
Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,
More informationStatistics Introductory Correlation
Statistics Introductory Correlation Session 10 oscardavid.barrerarodriguez@sciencespo.fr April 9, 2018 Outline 1 Statistics are not used only to describe central tendency and variability for a single variable.
More informationIntroductory Statistics with R: Simple Inferences for continuous data
Introductory Statistics with R: Simple Inferences for continuous data Statistical Packages STAT 1301 / 2300, Fall 2014 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail: sungkyu@pitt.edu
More informationMSc / PhD Course Advanced Biostatistics. dr. P. Nazarov
MSc / PhD Course Advanced Biostatistics dr. P. Nazarov petr.nazarov@crp-sante.lu 2-12-2012 1. Descriptive Statistics edu.sablab.net/abs2013 1 Outline Lecture 0. Introduction to R - continuation Data import
More informationIntro to Parametric & Nonparametric Statistics
Kinds of variable The classics & some others Intro to Parametric & Nonparametric Statistics Kinds of variables & why we care Kinds & definitions of nonparametric statistics Where parametric stats come
More informationReminder: Student Instructional Rating Surveys
Reminder: Student Instructional Rating Surveys You have until May 7 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs The survey should be available
More informationSlide 7.1. Theme 7. Correlation
Slide 7.1 Theme 7 Correlation Slide 7.2 Overview Researchers are often interested in exploring whether or not two variables are associated This lecture will consider Scatter plots Pearson correlation coefficient
More informationf X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx
INDEPENDENCE, COVARIANCE AND CORRELATION Independence: Intuitive idea of "Y is independent of X": The distribution of Y doesn't depend on the value of X. In terms of the conditional pdf's: "f(y x doesn't
More information7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between
7.2 One-Sample Correlation ( = a) Introduction Correlation analysis measures the strength and direction of association between variables. In this chapter we will test whether the population correlation
More informationTHE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS
THE ROYAL STATISTICAL SOCIETY 008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS The Society provides these solutions to assist candidates preparing for the examinations
More informationBivariate Relationships Between Variables
Bivariate Relationships Between Variables BUS 735: Business Decision Making and Research 1 Goals Specific goals: Detect relationships between variables. Be able to prescribe appropriate statistical methods
More informationCorrelation and Linear Regression
Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means
More informationDependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.
Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,
More informationRegression and correlation
6 Regression and correlation The main object of this chapter is to show how to perform basic regression analyses, including plots for model checking and display of confidence and prediction intervals.
More informationModel Building Chap 5 p251
Model Building Chap 5 p251 Models with one qualitative variable, 5.7 p277 Example 4 Colours : Blue, Green, Lemon Yellow and white Row Blue Green Lemon Insects trapped 1 0 0 1 45 2 0 0 1 59 3 0 0 1 48 4
More information5 Introduction to the Theory of Order Statistics and Rank Statistics
5 Introduction to the Theory of Order Statistics and Rank Statistics This section will contain a summary of important definitions and theorems that will be useful for understanding the theory of order
More informationStatistics Handbook. All statistical tables were computed by the author.
Statistics Handbook Contents Page Wilcoxon rank-sum test (Mann-Whitney equivalent) Wilcoxon matched-pairs test 3 Normal Distribution 4 Z-test Related samples t-test 5 Unrelated samples t-test 6 Variance
More informationData Analysis as a Decision Making Process
Data Analysis as a Decision Making Process I. Levels of Measurement A. NOIR - Nominal Categories with names - Ordinal Categories with names and a logical order - Intervals Numerical Scale with logically
More informationEcn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:
Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 You have until 10:20am to complete this exam. Please remember to put your name,
More informationCorrelation analysis. Contents
Correlation analysis Contents 1 Correlation analysis 2 1.1 Distribution function and independence of random variables.......... 2 1.2 Measures of statistical links between two random variables...........
More informationUNIT 4 RANK CORRELATION (Rho AND KENDALL RANK CORRELATION
UNIT 4 RANK CORRELATION (Rho AND KENDALL RANK CORRELATION Structure 4.0 Introduction 4.1 Objectives 4. Rank-Order s 4..1 Rank-order data 4.. Assumptions Underlying Pearson s r are Not Satisfied 4.3 Spearman
More informationParametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous
More informationStatistics in Stata Introduction to Stata
50 55 60 65 70 Statistics in Stata Introduction to Stata Thomas Scheike Statistical Methods, Used to test simple hypothesis regarding the mean in a single group. Independent samples and data approximately
More informationHYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC
1 HYPOTHESIS TESTING: THE CHI-SQUARE STATISTIC 7 steps of Hypothesis Testing 1. State the hypotheses 2. Identify level of significant 3. Identify the critical values 4. Calculate test statistics 5. Compare
More information1. Pearson linear correlation calculating testing multiple correlation. 2. Spearman rank correlation calculating testing 3. Other correlation measures
STATISTICAL METHODS 1. Introductory lecture 2. Random variables and probability theory 3. Populations and samples 4. Hypotheses testing and parameter estimation 5. Most widely used statistical tests I.
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationCorrelation. A statistics method to measure the relationship between two variables. Three characteristics
Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction
More informationLOOKING FOR RELATIONSHIPS
LOOKING FOR RELATIONSHIPS One of most common types of investigation we do is to look for relationships between variables. Variables may be nominal (categorical), for example looking at the effect of an
More informationτ xy N c N d n(n 1) / 2 n +1 y = n 1 12 v n n(n 1) ρ xy Brad Thiessen
Introduction: Effect of Correlation Types on Nonmetric Multidimensional Scaling of ITED Subscore Data Brad Thiessen In a discussion of potential data types that can be used as proximity measures, Coxon
More informationNonparametric Statistics
Nonparametric Statistics Nonparametric or Distribution-free statistics: used when data are ordinal (i.e., rankings) used when ratio/interval data are not normally distributed (data are converted to ranks)
More informationEC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)
1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For
More informationBINF 702 SPRING Chapter 8 Hypothesis Testing: Two-Sample Inference. BINF702 SPRING 2014 Chapter 8 Hypothesis Testing: Two- Sample Inference 1
BINF 702 SPRING 2014 Chapter 8 Hypothesis Testing: Two-Sample Inference Two- Sample Inference 1 A Poster Child for two-sample hypothesis testing Ex 8.1 Obstetrics In the birthweight data in Example 7.2,
More informationT- test recap. Week 7. One- sample t- test. One- sample t- test 5/13/12. t = x " µ s x. One- sample t- test Paired t- test Independent samples t- test
T- test recap Week 7 One- sample t- test Paired t- test Independent samples t- test T- test review Addi5onal tests of significance: correla5ons, qualita5ve data In each case, we re looking to see whether
More informationClass 11 Maths Chapter 15. Statistics
1 P a g e Class 11 Maths Chapter 15. Statistics Statistics is the Science of collection, organization, presentation, analysis and interpretation of the numerical data. Useful Terms 1. Limit of the Class
More informationSimple Linear Regression for the Climate Data
Prediction Prediction Interval Temperature 0.2 0.0 0.2 0.4 0.6 0.8 320 340 360 380 CO 2 Simple Linear Regression for the Climate Data What do we do with the data? y i = Temperature of i th Year x i =CO
More informationIntroduction to bivariate analysis
Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.
More informationImportant note: Transcripts are not substitutes for textbook assignments. 1
In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance
More informationSPSS Guide For MMI 409
SPSS Guide For MMI 409 by John Wong March 2012 Preface Hopefully, this document can provide some guidance to MMI 409 students on how to use SPSS to solve many of the problems covered in the D Agostino
More informationREVIEW 8/2/2017 陈芳华东师大英语系
REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p
More informationIntroduction to bivariate analysis
Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.
More informationCan you tell the relationship between students SAT scores and their college grades?
Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower
More informationDr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46
BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics
More informationBusiness Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee
Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)
More informationRegression. Marc H. Mehlman University of New Haven
Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and
More information