Learning Objectives. Math Chapter 3. Chapter 3. Association. Response and Explanatory Variables

Similar documents
Section 3.3. How Can We Predict the Outcome of a Variable? Agresti/Franklin Statistics, 1of 18

2017 Source of Foreign Income Earned By Fund

Canadian Imports of Honey

STATISTICS Relationships between variables: Correlation

Export Destinations and Input Prices. Appendix A

Do Policy-Related Shocks Affect Real Exchange Rates? An Empirical Analysis Using Sign Restrictions and a Penalty-Function Approach

Chapter 6 Scatterplots, Association and Correlation

READY TO SCRAP: HOW MANY VESSELS AT DEMOLITION VALUE?

04 June Dim A W V Total. Total Laser Met

International Student Enrollment Fall 2018 By CIP Code, Country of Citizenship, and Education Level Harpur College of Arts and Sciences

How Well Are Recessions and Recoveries Forecast? Prakash Loungani, Herman Stekler and Natalia Tamirisa

Sociology 6Z03 Review I

Chapter 6: Exploring Data: Relationships Lesson Plan

International and regional network status

SIMPLE LINEAR REGRESSION STAT 251

21st Century Global Learning

USDA Dairy Import License Circular for 2018

Chapter 4 Data with Two Variables

Cyclone Click to go to the page. Cyclone

A COMPREHENSIVE WORLDWIDE WEB-BASED WEATHER RADAR DATABASE

Chapter 4 Data with Two Variables

USDA Dairy Import License Circular for 2018

CONTINENT WISE ANALYSIS OF ZOOLOGICAL SCIENCE PERIODICALS: A SCIENTOMETRIC STUDY

Relationships Regression

Supplementary Appendix for. Version: February 3, 2014

Does socio-economic indicator influent ICT variable? II. Method of data collection, Objective and data gathered

ia PU BLi s g C o M Pa K T Wa i n CD-1576

2,152,283. Japan 6,350,859 32,301 6,383,160 58,239, ,790 58,464,425 6,091, ,091,085 52,565, ,420 52,768,905

DISTILLED SPIRITS - EXPORTS BY VALUE DECEMBER 2017

Solow model: Convergence

Math 243 OpenStax Chapter 12 Scatterplots and Linear Regression OpenIntro Section and

A Prior Distribution of Bayesian Nonparametrics Incorporating Multiple Distances

Parity Reversion of Absolute Purchasing Power Parity Zhi-bai ZHANG 1,a,* and Zhi-cun BIAN 2,b

Ch. 3 Review - LSRL AP Stats

Big Data at BBVA Research using BigQuery

Chapter 5 Friday, May 21st

Appendix B: Detailed tables showing overall figures by country and measure

Forecast Million Lbs. % Change 1. Carryin August 1, ,677, ,001, % 45.0

Kernel Wt. 593,190,150 1,218,046,237 1,811,236, ,364, ,826, ,191, Crop Year

USDA Dairy Import License Circular for 2018 Commodity/

Scatterplots and Correlation

The Outer Space Legal Regime and UN Register of Space Objects

Corporate Governance, and the Returns on Investment

Forecast Million Lbs. % Change 1. Carryin August 1, ,677, ,001, % 45.0

2001 Environmental Sustainability Index

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

Governments that have requested pre-export notifications pursuant to article 12, paragraph 10 (a), of the 1988 Convention

Nigerian Capital Importation QUARTER THREE 2016

GLOBAL INFORMATION PROCESS IN THE FIELD OF ELECTROCHEMISTRY AND MOLDAVIAN ELECTROCHEMICAL SCHOOL. SCIENCEMETRIC ANALYSIS

IEEE Transactions on Image Processing EiC Report

Quick Guide QUICK GUIDE. Activity 1: Determine the Reaction Rate in the Presence or Absence of an Enzyme

Global Data Catalog initiative Christophe Charpentier ArcGIS Content Product Manager

Applied Regression Analysis. Section 4: Diagnostics and Transformations

Chapter 3: Describing Relationships

Chapter 2: Looking at Data Relationships (Part 3)

ˆ GDP t = GDP t SCAN t (1) t stat : (3.71) (5.53) (3.27) AdjustedR 2 : 0.652

10.2 Fitting a Linear Model to Data

From Argentina to Zimbabwe: Where Should I Sell my Widgets?

BIVARIATE DATA data for two variables

Chapter 3: Describing Relationships

Bilateral Labour Agreements, 2004

PIRLS 2016 INTERNATIONAL RESULTS IN READING

Scaling Seed Kits Through Household Gardens

North-South Gap Mapping Assignment Country Classification / Statistical Analysis

A long-term global forecast for the extraction of oil and gas from shale formations

OCTOBER Almond Industry Position Report Crop Year /01-10/31 Kernel Wt /01-10/31 Kernel Wt.

Bayesian Poisson Tensor Factorization for Inferring Multilateral Relations from Sparse Dyadic Event Counts

Gravity Analysis of Regional Economic Interdependence: In case of Japan

DISTILLED SPIRITS - IMPORTS BY VALUE DECEMBER 2017

DISTILLED SPIRITS - IMPORTS BY VOLUME DECEMBER 2017

Chapter 14. Statistical versus Deterministic Relationships. Distance versus Speed. Describing Relationships: Scatterplots and Correlation

Country of Citizenship, College-Wide - All Students, Fall 2014

Publication Date: 15 Jan 2015 Effective Date: 12 Jan 2015 Addendum 6 to the CRI Technical Report (Version: 2014, Update 1)

Discovering the World of Geography

Stochastic Analysis and Forecasts of the Patterns of Speed, Acceleration, and Levels of Material Stock Accumulation in Society

Grand Total Baccalaureate Post-Baccalaureate Masters Doctorate Professional Post-Professional

About the Authors Geography and Tourism: The Attraction of Place p. 1 The Elements of Geography p. 2 Themes of Geography p. 4 Location: The Where of

Radiation Protection Procedures

TIMSS 2011 The TIMSS 2011 Instruction to Engage Students in Learning Scale, Fourth Grade

AP Statistics L I N E A R R E G R E S S I O N C H A P 7

11 Regression. Introduction. The Correlation Coefficient. The Least-Squares Regression Line

Elementary Statistics Lecture 3 Association: Contingency, Correlation and Regression

W o r l d O i l a n d G a s R e v i e w

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Objectives: 5pts Graph the average ecological footprints of several countries. Select two countries with different sized footprints and research

How Well Do Economists Forecast Recessions?

HOMEWORK (due Wed, Jan 23): Chapter 3: #42, 48, 74

Correlation & Simple Regression

ICC Rev August 2010 Original: English. Agreement. International Coffee Council 105 th Session September 2010 London, England

Chapter 5 Least Squares Regression

Chapter 10: Comparing Two Quantitative Variables Section 10.1: Scatterplots & Correlation

The response variable depends on the explanatory variable.

Vocabulary: Data About Us

Describing Data: Two Variables

If the roles of the variable are not clear, then which variable is placed on which axis is not important.

Bahrain, Israel, Jordan, Kuwait, Saudi Arabia, United Arab Emirates, Azerbaijan, Iraq, Qatar and Sudan.

DESKTOP STUDY ON GLOBAL IMPORTS OF HANDMADE CARPETS & FLOOR COVERINGS AT A GLANCE

How to display data badly

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Transcription:

ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION Chapter 3 Learning Objectives 3.1 The Association between Two Categorical Variables 1. Identify variable type: Response or Explanatory 2. Define Association 3. Contingency tables 4. Calculate proportions and conditional proportions jenniferechols.files.wordpress.com Response and Explanatory Variables Association Response variable (dependent, y) outcome variable Explanatory variable (independent, x) defines groups Response/Explanatory 1. Blood alcohol level/ # of beers consumed 2. Grade on test/amount of study time 3. Yield of corn/amount of rainfall Association When a value for one variable is more likely with certain values of the other variable Data analysis with two variables 1. Tell whether there is an association and 2. Describe that association tickets.worldcafelive.com 1

Contingency Table Contingency Table Displays two categorical variables The rows list the categories of one variable; the columns list the other Entries in the table are frequencies What is the response variable? What is the explanatory variable? www1.pictures.fp.zimbio.com Proportions & Conditional Proportions Proportions & Conditional Proportions What proportion of organic foods contain pesticides? Conventionally grown? All? i.treehugger.com Proportions & Conditional Proportions Proportions & Conditional Proportions Side by side bar charts show conditional proportions and allow for easy comparison If no association, then proportions would be the same Since there is association, then proportions are different www.vitalchoice.com 2

Learning Objectives: 3.2 The Association between Two Quantitative Variables 1. Constructing scatterplots 2. Interpreting a scatterplot 3. Correlation 4. Calculating correlation onlinestatbook.com Internet Usage & GDP Data Set Scatterplot INTERNET GDP INTERNET GDP Algeria 0.65 6.09 Japan 38.42 25.13 Argentina 10.08 11.32 Malaysia 27.31 8.75 Australia 37.14 25.37 Mexico 3.62 8.43 Austria 38.7 26.73 Netherlands 49.05 27.19 Belgium 31.04 25.52 New Zealand 46.12 19.16 Brazil 4.66 7.36 Nigeria 0.1 0.85 Canada 46.66 27.13 Norway 46.38 29.62 Chile 20.14 9.19 Pakistan 0.34 1.89 China 2.57 4.02 Philippines 2.56 3.84 Denmark 42.95 29 Russia 2.93 7.1 Egypt 0.93 3.52 Saudi Arabia 1.34 13.33 Finland 43.03 24.43 South Africa 6.49 11.29 France 26.38 23.99 Spain 18.27 20.15 Germany 37.36 25.35 Sweden 51.63 24.18 Greece 13.21 17.44 Switzerland 30.7 28.1 India 0.68 2.84 Turkey 6.04 5.89 Iran 1.56 6 United Kingdom 32.96 24.16 Ireland 23.31 32.41 United States 50.15 34.32 Israel 27.66 19.79 Vietnam 1.24 2.07 Yemen 0.09 0.79 www.knitwareblog.com Graph of two quantitative variables: Horizontal Axis: Explanatory, x Vertical Axis: Response, y INTERNET GDP INTERNET GDP Algeria 0.65 6.09 Japan 38.42 25.13 Argentina 10.08 11.32 Malaysia 27.31 8.75 Australia 37.14 25.37 Mexico 3.62 8.43 Austria 38.7 26.73 Netherlands 49.05 27.19 Belgium 31.04 25.52 New Zealand 46.12 19.16 Brazil 4.66 7.36 Nigeria 0.1 0.85 Canada 46.66 27.13 Norway 46.38 29.62 Chile 20.14 9.19 Pakistan 0.34 1.89 China 2.57 4.02 Philippines 2.56 3.84 Denmark 42.95 29 Russia 2.93 7.1 Egypt 0.93 3.52 Saudi Arabia 1.34 13.33 Finland 43.03 24.43 South Africa 6.49 11.29 France 26.38 23.99 Spain 18.27 20.15 Germany 37.36 25.35 Sweden 51.63 24.18 Greece 13.21 17.44 Switzerland 30.7 28.1 India 0.68 2.84 Turkey 6.04 5.89 Iran 1.56 6 United Kingdom 32.96 24.16 Ireland 23.31 32.41 United States 50.15 34.32 Israel 27.66 19.79 Vietnam 1.24 2.07 Yemen 0.09 0.79 Interpreting Scatterplots Used-car Dealership The overall pattern includes trend, direction, and strength of the relationship Trend: linear, curved, clusters, no pattern Direction: positive, negative, no direction Strength: how closely the points fit the trend Also look for outliers from the overall trend www.pritchettcartoons.com What association would we expect between the age of the car and mileage? a) Positive b) Negative c) No association 3

Linear Correlation, r Measures the strength and direction of the linear association between x and y Correlation coefficient: Measuring Strength & Direction of a Linear Relationship Positive r => positive association Negative r => negative association r close to +1 or -1 indicates strong linear association r close to 0 indicates weak association Learning Objectives 3.3 Can We Predict the Outcome of a Variable? www.cabnr.unr.edu 1. Define regression line 2. Predict with regression equation 3. Interpret slope and y-intercept 4. Identify least-squares regression line 5. Calculate least-squares regression line 6. Compare explanatory and response variables 7. Calculate and interpret r 2 What Is Regression? Regression Line Regress withdraw act of reasoning backward Regression Curve curve with best possible fit for data; describes how y changes with x graphics8.nytimes.com Predicts y, given x: y ˆ = a + bx The y-intercept and slope are a and b Only an estimate actual data vary Describes relationship between x and estimated means of y farm4.static.flickr.com 4

Residuals Least Squares Method Prediction errors: vertical distance between data point and regression line Large residual indicates unusual observation Each residual is: y yˆ Sum of residuals is always zero www.chem.utoronto.ca Goal: Minimize distance from data to regression line msenux.redwoods.edu Residual sum of squares: ( residuals) = ( y yˆ ) 2 2 Least squares regression line minimizes vertical distance between points and their predictions Regression Analysis Anthropologists Predict Height Using Remains? Identify response and explanatory variables Response variable is y Explanatory variable is x Regression Equation: yˆ = 61.4 + 2. 4x ŷ is predicted height and x is the length of a femur, thighbone (cm) Predict height for femur length of 50 cm www.geektoysgamesandgadgets.com Bones Interpreting the y-intercept and slope Slope Values: Positive, Negative, Zero y-intercept: y-value when x = 0 Helps plot line Slope: change in y for 1 unit increase in x 1 cm increase in femur length means 2.4 cm increase in predicted height yˆ = 61.4 + 2. 4x 5

Slope and Correlation Squared Correlation, r 2 Slope, b: Doesn t tell strength Has units Inverts if x and y are swapped Correlation, r: Describes strength No units Same if x and y are swapped Proportional reduction in error, r 2 Variation in y-values explained by relationship of y to x A correlation, r, of.9 means r 2 2 =.9 =.81 => 81% of variation in y is explained by x 81% Learning Objectives: 3.4 What Are Some Cautions in Analyzing Associations? 1. Extrapolation 2. Outliers and Influential Observations 3. Correlations does not imply causation 4. Lurking variables and confounding 5. Simpson s Paradox www.bio.uu.nl Extrapolation Outliers and Influential Points Extrapolation: Predicting y for x-values outside range of data Riskier the farther from the range of x No guarantee trend holds Regression outlier lies far away from rest of data Influential if both: 1. Low or high, compared to rest of data 2. Regression outlier Neil Weiss, Elementary Statistics, 7 th Edition www2.selu.edu 6

Correlation Does Not Imply Causation Chicago Fires of Last Year Strong correlation between x and y means Strong linear association between the variables Does not mean x causes y www.teachbabymusic.com x = # firefighters at fire y = cost of damages 1. Correlation is +, -, 0? 2. Do more firefighters cause damages to be worse? 3. What else might cause association? a. Distance from station b. Intensity of fire c. Size of fire pixdaus.com Lurking Variables & Confounding Simpson s Paradox 1. Ice cream sales & drowning => temperature 2. Reading level & shoe size => age Simpson s Paradox: Association between two variables reverses after third is included image3.examiner.com Confounding two explanatory variables both associated with response variable and each other Lurking variables not measured in study but may confound Homer (not really the right) Simpson www.jewsinalabama.com Simpson s Paradox Example Simpson s Paradox Example Break out Data by Age streetpulse.files.wordpress.com Probability of Death of Smoker = 139/582 = 24% Probability of Death of Nonsmoker = 230/732 = 31% blogs.smh.com.au Greta Garbo 7

Simpson s Paradox Example Image Sources www.straitstimes.com Associations look quite different after adjusting for third variable Statistics: The Art and Science of Learning from Data, 2 nd Edition, Agresti and Franklin http://jenniferechols.files.wordpress.com/2007/08/two-friends-hugging.jpg http://www.hessdesignworks.com/illustrations/corn.jpg http://tickets.worldcafelive.com/uplimage/beercircle.jpg http://i.treehugger.com/images/2007/10/24/pesticide-jj-001.jpg http://onlinestatbook.com/chapter12/graphics/reg_error.gif http://www.knitwareblog.com/wp-content/uploads/2008/06/firefox-3-download-map.jpg http://www.pritchettcartoons.com/jeremy/used_cars.gif http://scienceaid.co.uk/psychology/approaches/images/correlation.jpg http://nrtwq.usgs.gov/images/methods/sscvsturb.png http://www.agdesktop.com/wallpapers%5ctelefilm%5cbones%5ctemperance_bones_brennan-seeley%20booth-001.jpg http://farm3.static.flickr.com/2706/4441460977_61dfcc3e6e.jpg http://msenux.redwoods.edu/math/r/graphics/regression1.gif http://graphics8.nytimes.com/images/2009/04/27/world/27withdraw.xlarge12.jpg http://farm4.static.flickr.com/3311/3577858126_093b727095.jpg http://www.chem.utoronto.ca/coursenotes/analsci/stats/images/linreggraph.gif http://thumb11.shutterstock.com.edgesuite.net/display_pic_with_logo/66811/66811,1179456692,1/stock-photo-an-upwardgraph-on-a-green-chalkboard-3322824.jpg http://www.bio.uu.nl/~biostat/outlier.gif Neil Weiss, Elementary Statistics, 7th Edition http://www2.selu.edu/academics/faculty/dgurney/math241/stattopics/scatanal_files/image004.gif http://pixdaus.com/pics/1221327421sxkwkqy.jpg http://www.teachbabymusic.com/img/piano_r3_c1.jpg http://image3.examiner.com/images/blog/exid30987/images/resized_child_eating_ice_cream.jpg http://blogs.smh.com.au/girlsguide/garbo313.jpg http://www.straitstimes.com/sti/stimedia/image/20100317/smoker-reuters.jpg http://www1.pictures.fp.zimbio.com/marcia+cross+running+errands+brentwood+65f2awtuzjal.jpg 8