Unit 30: Inference for Regression

Similar documents
EXST Regression Techniques Page 1

First derivative analysis

What are those βs anyway? Understanding Design Matrix & Odds ratios

Solution of Assignment #2

Applied Statistics II - Categorical Data Analysis Data analysis using Genstat - Exercise 2 Logistic regression

Higher order derivatives

Math 34A. Final Review

ph People Grade Level: basic Duration: minutes Setting: classroom or field site

Chapter 3 Exponential and Logarithmic Functions. Section a. In the exponential decay model A. Check Point Exercises

Addition of angular momentum

Partial Derivatives: Suppose that z = f(x, y) is a function of two variables.

4. Money cannot be neutral in the short-run the neutrality of money is exclusively a medium run phenomenon.

A Propagating Wave Packet Group Velocity Dispersion

Dealing with quantitative data and problem solving life is a story problem! Attacking Quantitative Problems

Addition of angular momentum


Calculus concepts derivatives

are given in the table below. t (hours)

22/ Breakdown of the Born-Oppenheimer approximation. Selection rules for rotational-vibrational transitions. P, R branches.

Answer Homework 5 PHA5127 Fall 1999 Jeff Stark

MCB137: Physical Biology of the Cell Spring 2017 Homework 6: Ligand binding and the MWC model of allostery (Due 3/23/17)

Differentiation of Exponential Functions

Differential Equations

15. Stress-Strain behavior of soils

Where k is either given or determined from the data and c is an arbitrary constant.

Search sequence databases 3 10/25/2016

Chapter 13 Aggregate Supply

Alpha and beta decay equation practice

MEMORIAL UNIVERSITY OF NEWFOUNDLAND

4 x 4, and. where x is Town Square

Math-3. Lesson 5-6 Euler s Number e Logarithmic and Exponential Modeling (Newton s Law of Cooling)

2008 AP Calculus BC Multiple Choice Exam

Sec 2.3 Modeling with First Order Equations

Exam 1. It is important that you clearly show your work and mark the final answer clearly, closed book, closed notes, no calculator.

Probability Translation Guide

Title: Vibrational structure of electronic transition

Ch. 24 Molecular Reaction Dynamics 1. Collision Theory

REGISTER!!! The Farmer and the Seeds (a parable of scientific reasoning) Class Updates. The Farmer and the Seeds. The Farmer and the Seeds

Chapter 14 Aggregate Supply and the Short-run Tradeoff Between Inflation and Unemployment

Errata. Items with asterisks will still be in the Second Printing

September 23, Honors Chem Atomic structure.notebook. Atomic Structure

Estimation of apparent fraction defective: A mathematical approach

That is, we start with a general matrix: And end with a simpler matrix:

Systems of Equations

Quasi-Classical States of the Simple Harmonic Oscillator

Determination of Vibrational and Electronic Parameters From an Electronic Spectrum of I 2 and a Birge-Sponer Plot

PHA 5127 Answers Homework 2 Fall 2001

Observer Bias and Reliability By Xunchi Pu

A. Limits and Horizontal Asymptotes ( ) f x f x. f x. x "±# ( ).

ECE602 Exam 1 April 5, You must show ALL of your work for full credit.

Fourier Transforms and the Wave Equation. Key Mathematics: More Fourier transform theory, especially as applied to solving the wave equation.

CS 361 Meeting 12 10/3/18

Review Statistics review 14: Logistic regression Viv Bewick 1, Liz Cheek 1 and Jonathan Ball 2

Inflation and Unemployment

A central nucleus. Protons have a positive charge Electrons have a negative charge

INFLUENCE OF GROUND SUBSIDENCE IN THE DAMAGE TO MEXICO CITY S PRIMARY WATER SYSTEM DUE TO THE 1985 EARTHQUAKE

1 Minimum Cut Problem

Computing and Communications -- Network Coding

The van der Waals interaction 1 D. E. Soper 2 University of Oregon 20 April 2012

Chapter 13 GMM for Linear Factor Models in Discount Factor form. GMM on the pricing errors gives a crosssectional

3-2-1 ANN Architecture

1973 AP Calculus AB: Section I

1997 AP Calculus AB: Section I, Part A

Application of Vague Soft Sets in students evaluation

Section 11.6: Directional Derivatives and the Gradient Vector

Background: We have discussed the PIB, HO, and the energy of the RR model. In this chapter, the H-atom, and atomic orbitals.

TEMASEK JUNIOR COLLEGE, SINGAPORE. JC 2 Preliminary Examination 2017

Davisson Germer experiment

Sundials and Linear Algebra

SCIENCE Student Book. 3rd Grade Unit 2

Need to understand interaction of macroscopic measures

COHORT MBA. Exponential function. MATH review (part2) by Lucian Mitroiu. The LOG and EXP functions. Properties: e e. lim.

Calculus II (MAC )

There is an arbitrary overall complex phase that could be added to A, but since this makes no difference we set it to zero and choose A real.

COMPUTER GENERATED HOLOGRAMS Optical Sciences 627 W.J. Dallas (Monday, April 04, 2005, 8:35 AM) PART I: CHAPTER TWO COMB MATH.

u r du = ur+1 r + 1 du = ln u + C u sin u du = cos u + C cos u du = sin u + C sec u tan u du = sec u + C e u du = e u + C

Brief Introduction to Statistical Mechanics

General Notes About 2007 AP Physics Scoring Guidelines

Physics 178/278 - David Kleinfeld - Fall checked Winter 2014

EEO 401 Digital Signal Processing Prof. Mark Fowler

EEO 401 Digital Signal Processing Prof. Mark Fowler

Electrochemistry L E O

Gradebook & Midterm & Office Hours

Data Assimilation 1. Alan O Neill National Centre for Earth Observation UK

EXAMINATION QUESTION SVSOS3003 Fall 2004 Some suggestions for answering the questions

Forces. Quantum ElectroDynamics. α = = We have now:

SCHUR S THEOREM REU SUMMER 2005

The graph of y = x (or y = ) consists of two branches, As x 0, y + ; as x 0, y +. x = 0 is the

Solution: APPM 1360 Final (150 pts) Spring (60 pts total) The following parts are not related, justify your answers:

Recall that by Theorems 10.3 and 10.4 together provide us the estimate o(n2 ), S(q) q 9, q=1

5. B To determine all the holes and asymptotes of the equation: y = bdc dced f gbd

The pn junction: 2 Current vs Voltage (IV) characteristics

u x v x dx u x v x v x u x dx d u x v x u x v x dx u x v x dx Integration by Parts Formula

MATH 319, WEEK 15: The Fundamental Matrix, Non-Homogeneous Systems of Differential Equations

y = 2xe x + x 2 e x at (0, 3). solution: Since y is implicitly related to x we have to use implicit differentiation: 3 6y = 0 y = 1 2 x ln(b) ln(b)

6.1 Integration by Parts and Present Value. Copyright Cengage Learning. All rights reserved.

orbiting electron turns out to be wrong even though it Unfortunately, the classical visualization of the

Supplemental Appendix: Equations of Lines, Compound Inequalities, and Solving Systems of Linear Equations in Two Variables

On spanning trees and cycles of multicolored point sets with few intersections

7' The growth of yeast, a microscopic fungus used to make bread, in a test tube can be

Transcription:

Unit 30: Infrnc for Rgrssion Summary of Vido In Unit 11, Fitting Lins to Data, w xamind th rlationship btwn wintr snowpack and spring runoff. Colorado rsourc managrs mad prdictions about th sasonal watr supply using a last-squars rgrssion lin that was fit to a scattrplot of thir masurmnt data, which is shown in Figur 30.1. Figur 30.1. Last-squars rgrssion lin usd by Colorado rsourc managrs. But would w rally s a linar rlationship btwn snowpack and runoff if w had all th possibl data? Or might th pattrn w s in th sampl data s scattrplot occur just by chanc? W would lik to know whthr th positiv association w s btwn snowpack and runoff in th sampl is strong nough that w can conclud that th sam rlationship holds for th whol population. Statisticians rly on infrnc to dtrmin whthr th rlationship obsrvd btwn two variabls in a sampl is valid for som largr population. Infrnc is a powrful tool. Powrful nough, in fact, to hlp bring an ntir bird spcis back from th brink of xtinction. Aftr World War II, th agrichmical industry bgan massproducing chmicals to control psts. Citis lik San Antonio, Txas, sprayd whol sctions of th city with th inscticid DDT in thir fight against th sprad of poliomylitis. Unfortunatly, Unit 30: Infrnc for Rgrssion Studnt Guid Pag 1

thr wrn t many safguards in plac, and th damaging nvironmntal ffcts of ths compounds wr not takn into account. Evntually, changs in th natural nvironmnt du to chmical psticids bcam apparnt. On spcis that was svrly affctd was th prgrin falcon. In Grat Britain, Drk Ratcliff noticd in th 1950s that prgrin falcons wr dclining at nsting sits and thy wr unabl to hatch thir ggs. This dclin in falcons was vntually dmonstratd to b a worldwid phnomnon. Rsarchrs dtrmind that th rason prgrin falcons wr not succssfully hatching thir ggs was du to ggshll thinning, a vry srious problm sinc th wakr shlls wr braking bfor th baby birds wr rady to hatch. Aftr looking at som of th causs for this ggshll thinning, scintists bgan to zro in on a possibl culprit: DDT and its brakdown product, DDE. Thr wr a coupl of rasons why scintists blivd that thr was a rlationship btwn DDT or DDE and ggshll thinning. In studying th brokn ggshlls and ggs collctd in th fild, scintists found vry high rsidus of DDE that had not bn sn in historic sampls. Th falcons wr ingsting DDT through thir pry birds thy at had small concntrations of th chmical in thir flsh. Ovr tim th DDT built up in th prgrins own bodis and startd to affct th fmals ability to lay halthy ggs. Evn though scintists had a prtty strong hunch that DDT was th caus of prgrin falcon ggshll thinning, thy could not rly on thir scintific instincts alon. So, rsarchrs turnd to statistics as a way to validat thir analyss. W can follow in th rsarchrs footstps by taking a look at a data st comprisd of 68 prgrin falcon ggs from Alaska and Northrn Canada. A scattrplot of th two variabls w will b studying, ggshll thicknss (rspons variabl) and th log-concntration of DDE (xplanatory variabl), appars in Figur 30.2. W hav addd th last-squars rgrssion lin fit to ths data. Rmmbr it is dscribd by an quation of th form ŷ = a + bx. Unit 30: Infrnc for Rgrssion Studnt Guid Pag 2

Figur 30.2. Scattrplot of ggshll thicknss vrsus log-concntration of DDE. Th data in Figur 30.2 show a ngativ, linar rlationship btwn th two variabls. Using th quation, w can prdict ggshll thicknss for any masurmnt of DDE. Th slop b and intrcpt a ar statistics, maning w calculatd thm from our sampl data. But if w rpatd th study with a diffrnt sampl of ggs, th statistics a and b would tak on somwhat diffrnt valus. So, what w want to know now is whthr thr rally is a ngativ linar rlationship btwn ths variabls for th ntir population of all prgrin ggs, byond just th ggs that happn to b in our sampl. Or might th pattrn w s in th sampl data b du simply to chanc variation? Data of th ntir prgrin gg population might look lik th scattrplot in Figur 30.3. Notic that for any givn valu of th xplanatory variabl, such as th valu indicatd by th vrtical lin, many diffrnt ggshll thicknsss may b obsrvd. Figur 30.3. Scattrplot rprsnting population of prgrin ggs. Unit 30: Infrnc for Rgrssion Studnt Guid Pag 3

In th scattrplot in Figur 30.4, th man ggshll thicknss, y, dos hav a linar rlationship with th log concntration of DDE, x. Th lin fit to th hypothtical population data is calld th population rgrssion lin. Bcaus w don t hav accss to ALL th population data, w us our sampl data to stimat th population rgrssion lin. Figur 30.4. Th population rgrssion lin fit to th population data. Svral conditions, which ar discussd in th Contnt Ovrviw, must b mt in ordr to mov forward with rgrssion infrnc. You can chck out whthr ths conditions ar satisfid in Rviw Qustion 1. But for now, w assum that th conditions for infrnc ar mt. Th population rgrssion modl is writtn as follows: µ y = α + βx whr rprsnts th tru population man of th rspons y for th givn lvl of x, α y is th population y-intrcpt, and β is th population slop. Now lt s look back at our last squars rgrssion lin, basd on th sampl of 68 bird ggs. Th quation is yˆ = 2.146 0.3191x Th sampl intrcpt, a = 2.146, is an stimat for th population intrcpt α. And th sampl slop, b = -0.3191, is an stimat for th population slop β. Of cours, w v larnd by now that othr sampls from th sam population will giv us diffrnt data, rsulting in diffrnt paramtr stimats of α and β. In rpatd sampling, th valu of ths statistics, a and b, form sampling distributions, which provid th basis for statistical infrnc. In particular, w want to infr from th sampling distribution for our statistic b, whthr th sampl data provid sufficintly strong vidnc that highr lvls of DDE ar Unit 30: Infrnc for Rgrssion Studnt Guid Pag 4

rlatd to ggshll thinning in th population. To answr this qustion, w st up our null and altrnativ hypothss. H : Amount of DDE and ggshll thicknss hav no linar rlationship. o or H 0 : β = 0 H : Amount of DDE and ggshll thicknss hav a ngativ linar rlationship. a or H a : β < 0 Th t-tst statistic for tsting th null hypothsis is: t = b β 0 s b whr b is our sampl stimat for th population slop, β 0 is th null hypothsis valu for th population slop, and s b is th standard rror of th stimat b, which w can gt from softwar. In this cas, s b = 0.0255. Nxt, w calculat th valu of our t-tst statistic: t 0.3191 0 = 12.5 0.0255 If th null hypothsis is tru, thn t has a t-distribution with n 2, or 66, dgrs of frdom. Th valu t = -12.5 is an xtrm valu and th corrsponding p-valu is ssntially 0. Thus, w hav strong vidnc to rjct th null hypothsis. By rjcting th null hypothsis, w can confirm what scintists alrady suspctd that thr is a connction btwn prgrin falcon ggshll thicknss and th prsnc of DDE. Mor prcisly, thr is a statistically significant, ngativ linar rlationship btwn th log-concntration of DDE and th thicknss of prgrin ggshlls. Bfor rsarchrs could prsnt this finding to th public, howvr, thy had to quantify th rlationship. That mant computing a confidnc intrval for th population slop. Hr s th formula: b± t * s b For a 95% confidnc intrval and df = 68 2 = 66, w find t* = 1.997. Now, w can comput th confidnc intrval: Unit 30: Infrnc for Rgrssion Studnt Guid Pag 5

0.3191 ± (1.997)(0.0255) 3.191± 0.0509 0.3700 to 0.2681 Hnc, basd on our sampl of 68 prgrin falcon ggs, w ar 95% confidnt that a onunit incras in th log-concntration of DDE is associatd with a tru avrag dcras of btwn 0.27 and 0.37 in Ratcliff s ggshll thicknss indx. Armd with this information, scintists wr abl to mak a strong argumnt against th us of DDT bcaus of its dangrous impact on prgrins and th nvironmnt as a whol. Ths rsults ld to a prolongd lgal battl with popl on both sids prsnting vidnc. Du to scintific and statistical vidnc, th Unitd Stats and many Wstrn Europan countris bannd DDT us. Sinc thn, th prgrin falcon population has rboundd significantly. So, this nvironmntal dtctiv story has a happy nding for th prgrin falcons. Unit 30: Infrnc for Rgrssion Studnt Guid Pag 6

Studnt Larning Objctivs A. Undrstand th linar rgrssion modl. Know how to find th last-squars rgrssion lin as an stimat (covrd in Unit 11, Fitting Lins to Data.) B. Know how to chck whthr th assumptions for th linar rgrssion modl ar rasonably satisfid. C. Rcall how to find th last-squars rgrssion quation (Unit 11, Fitting Lins to Data). D. B abl to calculat, or obtain from softwar, th standard rror of th stimat, s, and th standard rror of th slop, s b. E. B abl to conduct a significanc tst for th population slop β. F. B abl to calculat a confidnc intrval for th population slop β. Unit 30: Infrnc for Rgrssion Studnt Guid Pag 7

Contnt Ovrviw Whil w oftn har of th bnfits of ating fish, w also har warnings about limiting our consumption of crtain fish whos flsh contains high lvls of mrcury. Much lik th prgrin falcons and DDT, small lvls of mrcury in ocans, laks, and strams build up in fish tissu ovr tim. It bcoms most concntratd in largr fish, which ar highr up on th food chain. To bttr undrstand th rlationship btwn fish siz and mrcury concntration, th Unitd Stat Gological Survy (USGS) collctd data on total fish lngth and mrcury concntration in fish tissu. (Total lngth is th lngth from th tip of th snout to th tip of th tail.) Th data from a sampl of largmouth bass (of lgal siz to catch) collctd in Lak Natoma, California, appar in Tabl 30.1. (You may rmmbr ths data from Rviw Qustion 3 in Unit 11.) Total Lngth Mrcury Concntration Total Lngth Mrcury Concntration (mm) (µg/g wt wt.) (mm) (µg/g wt wt.) 341 0.515 490 0.807 353 0.268 315 0.320 387 0.450 360 0.332 375 0.516 385 0.584 389 0.342 390 0.580 395 0.495 410 0.722 407 0.604 425 0.550 415 0.695 480 0.923 425 0.577 448 0.653 446 0.692 460 0.755 Tabl 30.1. Fish total lngth and mrcury concntration in fish tissu. Sinc w bliv that fish lngth xplains mrcury concntration, total lngth is th xplanatory variabl and mrcury concntration is th rspons variabl. A scattrplot of mrcury concntration vrsus total lngth appars in Figur 30.5. Unit 30: Infrnc for Rgrssion Studnt Guid Pag 8

1.0 Mrcury Concntration (µg/g) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 y = - 0.7374 + 0.003227x 0.2 300 350 400 Total Lngth (mm) 450 500 Figur 30.5. Scattrplot of mrcury concntration vrsus total fish lngth. Sinc th pattrn of th dots in th scattrplot indicats a positiv, linar rlationship btwn th two variabls, w fit a last-squars lin to th data. Howvr, ths data ar a sampl of 20 largmouth bass from th population of all th largmouth bass that liv in Lak Natoma. Whil w can us th last-squars quation to mak prdictions about mrcury concntration for fish of a particular lngth, w nd tchniqus from statistical infrnc to answr th following qustions about th population: Is thr rally a positiv linar rlationship btwn th variabls mrcury concntration and total lngth, or might th pattrn obsrvd in th scattrplot b du simply to chanc? Can w dtrmin a confidnc intrval stimat for th population slop, th rat of chang of mrcury concntration pr on millimtr incras in fish total lngth? If w us th last-squars lin to prdict th mrcury concntration for a fish of a particular lngth, how rliabl is our prdiction? Now, what if w could mak a scattrplot of mrcury concntration vrsus total lngth for all of th largmouth bass (at or clos to th lgal catch lngth) in Lak Natoma? Figur 30.6 shows how a scattrplot of th population might look and how a rgrssion lin fit to th population data might look. Unit 30: Infrnc for Rgrssion Studnt Guid Pag 9

Population Scattrplot 1.25 Mrcury Concntration µg/g 1.00 0.75 0.50 0.25 0.00 µ = α + β x y 200 300 x 400 x 1 2 Total Lngth (mm) 500 600 Figur 30.6. Population scattrplot of mrcury concntration vrsus total lngth. Notic, for ach fish lngth, x, thr ar many diffrnt valus of mrcury concntration, y. For xampl, in Figur 30.6 a vrtical lin sgmnt has bn drawn at lngth x 1. That lin sgmnt intrscts with a whol distribution of mrcury concntration valus, y-valus, on th scattrplot. Th man of that distribution of y-valus, µ y, is at th intrsction of th vrtical lin at x1 and th rgrssion lin. Now look at th vrtical lin at x 2. It too intrscts with an ntir distribution of y-valus, with man at th intrsction of th vrtical lin at x 2 and th rgrssion lin. So, th population rgrssion lin dscribs how th man mrcury concntration valus, µ y, ar rlatd to total lngth, x. In this cas, th rlationship looks linar and so w xprss it as: µ y = α + βx. As mntiond arlir in this unit, svral conditions must b mt in ordr to mov forward with rgrssion infrnc. Thos conditions, along with a dscription of th simpl linar rgrssion modl, ar prsntd blow. Unit 30: Infrnc for Rgrssion Studnt Guid Pag 10

Simpl Linar Rgrssion Modl and Conditions W hav n ordrd pairs of obsrvations (x, y) on an xplanatory variabl, x, and rspons variabl, y. Th simpl linar rgrssion modl assums that for ach valu of x th obsrvd valus of th rspons variabl, y, vary about a man µ y that has a linar rlationship with x: µ y = α + βx Th lin dscribd by µ y = α + βx is calld th population rgrssion lin. In addition, th following conditions must b satisfid: For any fixd valu of x, th rspons y varis according to a normal distribution. Rpatd rsponss, y-valus, ar indpndnt of ach othr. Th standard dviation of y for any valu of x, σ, is th sam for all valus of x. Thus, th modl has thr unknown population paramtrs: α, β, and σ. Figur 30.7 provids a graphic rprsntation of th simpl linar rgrssion modl and conditions. y µ y= α + βx α + βx 1 σ α + βx 2 σ α + βx 3 σ x x x 1 2 3 x Thr diffrnt x-valus Figur 30.7. Graphic rprsntation of linar rgrssion modl. A first stp in infrnc is to stimat th unknown paramtrs. W bgin with stimats for th slop and intrcpt of th population rgrssion lin. Th stimatd rgrssion lin for th linar rgrssion modl is th last-squars lin, ŷ = a + bx. From Figur 30.5, th stimatd rgrssion lin is: Unit 30: Infrnc for Rgrssion Studnt Guid Pag 11

yˆ = 0.7374 + 0.003227x Th y-intrcpt, a = -0.7374, is a point stimat for th population intrcpt, α, and th slop, b = 0.003227, is a point stimat of th population slop, β. Nxt, w dvlop an stimat for σ, which masurs th variability of th rspons y about th population rgrssion lin. Bcaus th last-squars lin stimats th population rgrssion lin, th rsiduals stimat how much y varis about th population rgrssion lin: rsidual = obsrvd y prdictd y = y yˆ W stimat σ from th standard dviation of th rsiduals, s, as follows: s 2 ( y yˆ ) SSE = = n 2 n 2 Our stimat for σ, s, is calld th standard rror of th stimat. Th computation of s is tdious by hand. Rgrssion outputs from statistical softwar will comput th valu for you. Howvr, hr s how it is computd in our xampl of mrcury concntration and fish lngth. First, w ll comput th rsidual corrsponding to data valu (341, 0.515) as a rmindr of how that is don. y ˆ = 0.7374 + 0.003227(341) 0.363 y yˆ = 0.515 0.363 = 0.152 Hr ar all 20 rsiduals (roundd to thr dcimals): 0.152-0.134-0.062 0.043-0.176-0.042 0.028 0.093-0.057-0.010-0.037 0.041-0.092 0.079 0.059 0.136-0.084 0.111-0.055 0.008 Nxt, w calculat th SSE, th sum of th squars of th rsiduals: SSE = 2 2 2 2 (0.152) + (.0134) + ( 0.062) +... + (0.008) 0.1545 Unit 30: Infrnc for Rgrssion Studnt Guid Pag 12

Now, w calculat s : SSE 0.1545 s = 0.0926 20 2 18 μg/g W can us th quation of th last-squars lin, y ˆ = 0.7374 + 0.003227, to mak prdictions. Howvr, thos prdictions ar mor rliabl whn th data points li clos to th lin. Kp in mind that s is on masur of th closnss of th data to th last-squars lin. If s = 0, th data points fall xactly on th last-squars lin. Morovr, whn s is positiv, w can us it to plac rror bounds abov and blow th last-squars lin. Ths rror bounds ar lins paralll to th last-squars lin that li on or two s abov and blow th last-squars lin. W apply this ida to our mrcury concntration and fish lngth data. yˆ = 0.7374 + 0.003227x± 0.0926 yˆ = 0.7374 + 0.003227x± 2(0.0926) 1.0 0.9 Mrcury Concntration ( µg/g) 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 300 350 400 Total Lngth (mm) 450 500 Figur 30.8. Adding lins ± s and ±2 s abov and blow th last-squars lin. Rcall from Unit 8, Normal Calculations, that w xpct roughly 68% of normal data to li within on standard dviation of th man and roughly 95% to li within two standard dviations of th man. Notic that all of our data fall within two s of th last-squars lin. So, for a particular fish lngth, say with total lngth = 400 mm, w xpct roughly 95% of th fish to hav mrcury concntrations btwn 0.3682 μg/g and 0.7386 μg/g. Th standard rror of th stimat provids on way to slct btwn compting modls. For xampl, suppos w had a scond modl rlating mrcury concntration to th xplanatory variabl fish wight. Choos th modl with th smallr valu for s. Unit 30: Infrnc for Rgrssion Studnt Guid Pag 13

Th scattrplot in Figur 30.5 appars to support th hypothsis that longr fish tnd to hav highr lvls of mrcury concntration. But is this positiv association statistically significant? Or could it hav occurrd just by chanc? To answr this qustion, w st up th following null and altrnativ hypothss: H 0 : Total lngth and mrcury concntration hav no linar rlationship. or H 0 : β = 0 H a : Total lngth and mrcury concntration hav a positiv linar rlationship. or H a : β > 0 A rgrssion lin with slop 0 is horizontal. That indicats that th man of th rspons y dos not chang as x changs which, in turn, mans that th linar rgrssion quation is of no valu in prdicting y. In th cas of mrcury concntration and total lngth, th stimat of th population slop is vry small, b = 0.003227. So, w might jump to th conclusion that total lngth is not usful in prdicting mrcury concntration. But w d bttr work through th dtails of a significanc tst bfor jumping to such a conclusion. Significanc Tst For Rgrssion Slop, β To tst th hypothsis H 0 : β = β 0, comput th t-tst statistic: t = b β 0 s b whr s b = s ( x x) 2 and b is th last-squars stimat of th population slop, β, and β 0 is th null hypothsis valu for β. If th null hypothsis is tru and th linar rgrssion conditions ar satisfid, thn t has a t-distribution with df = n 2. Back to th situation with mrcury concntration and fish lngth. W us softwar to hlp us calculat s b : s b = 0.093 0.000468 39463.2 Unit 30: Infrnc for Rgrssion Studnt Guid Pag 14

Now w ar rady to calculat t: t 0.003227 0 = 6.9 0.000468 In this cas, df = n 2 = 20 2 = 18. Sinc this is a on-sidd altrnativ, w find th probability of obsrving a valu of t at last as larg as th on w obsrvd, 6.9. As shown in Figur 30.9, th ara undr th t-dnsity curv to th right of 6.9 is so small that it isn t rally visibl. Th ara is only 9.4127 10-7 ; so, p 0. W conclud that thr is sufficint vidnc to rjct th null hypothsis and conclud β > 0. Thr is a positiv linar rlationship btwn total lngth and mrcury concntration. Dnsity Curv for t-distribution, df = 18 9.4127E-07 0 6.9 t Figur 30.9. Calculating th p-valu. Nxt, w calculat a confidnc intrval stimat for th rgrssion slop, β. Hr ar th dtails for constructing a confidnc intrval. Confidnc Intrval For Rgrssion Slop, β A confidnc intrval for β is computd using th following formula: b± t * s b whr t* is a t-critical valu associatd with th confidnc lvl and dtrmind from a t-distribution with df = n 2; b is th last-squars stimat of th population slop, and sb is th standard rror of b. Unit 30: Infrnc for Rgrssion Studnt Guid Pag 15

To calculat th confidnc intrval, w start by dtrmining th valu of t* for a 95% confidnc intrval whn df = 18. Using a t-tabl, w gt t* = 2.101. W can now calculat th confidnc intrval: b± t * s b 0.003227 ± (2.101)(0.000468) 0.003227 ± 0.000983, Or, roundd to four dcimals, from 0.0022 to 0.0042. Thus, for ach incras of 1 millimtr in total lngth, w xpct th mrcury concntration to incras btwn 0.0022 μg/g and 0.0042 μg/g. That may sm lik a small incras, but, for xampl, Florida has st th saf limit on mrcury concntration to b blow 0.5 μg/g. Th rsults from infrnc ar trustworthy providd th conditions for th simpl linar rgrssion modl ar satisfid. W conclud this ovrviw with a discussion of chcking th conditions what should b don first bfor procding to infrnc. Th conditions involv th population rgrssion lin and dviations of rsponss, y-valus, from this lin. W don t know th population rgrssion lin, but w hav th last-squars lin as an stimat. W also don t know th dviations from th population rgrssion lin, but w hav th rsiduals as stimats. So, chcking th assumptions can b don through xamining th rsiduals. Hr is a rundown of th conditions that must b chckd: 1. Linarity Chck th adquacy of th linar modl (covrd in Unit 11). Mak a rsidual plot, a scattrplot of th rsiduals vrsus th xplanatory variabl. If th pattrn of th dots appars random, with about half th dots abov th horizontal axis and half blow, thn th condition of linarity is satisfid. 2. Normality Th rsponss, y-valus, vary normally about th rgrssion lin for ach x. This dos not man that th y-valus ar normally distributd bcaus diffrnt y-valus com from diffrnt x-valus. Howvr, th dviations of th y-valus about thir man (th rgrssion lin) ar normal and thos dviations ar stimatd by th rsiduals. So, chck that th rsiduals ar approximatly normally distributd (covrd in Unit 9). Mak a normal quantil plot. If th pattrn of th dots appars fairly linar, thn th condition of normality is satisfid. If th plot indicats that th rsiduals ar svrly skwd or contain xtrm outlirs, thn this condition is not satisfid. 3. Indpndnc Th rsponss, y-valus, must b indpndnt of ach othr. Th bst vidnc of indpndnc is that th data ar a random sampl. Unit 30: Infrnc for Rgrssion Studnt Guid Pag 16

4. Constant standard dviations of th rsponss for all x To chck this condition, xamin a rsidual plot. Chck to s if th vrtical sprad of th dots rmains about th sam as x-valus incras. As an xampl, considr th two rsidual plots in Figur 30.10. In rsidual plot (a), th vrtical sprad is about th sam for small x-valus as it is for larg x-valus. In this cas, Condition 4 is satisfid. In rsidual plot (b), th sprad of th rsiduals tnds to incras as x-valus incras. W v usd a pncil to roughly draw an outlin of th sprad as it fans out for largr valus of x. Hr Condition 4 is not satisfid. 2 2 1 1 Rsiduals 0 Rsiduals 0-1 -1-2 1 2 3 x 4 5-2 1 2 3 x 4 5 (a) (b) Figur 30.10. Chcking to s if Condition 4 is satisfid. Now, w rturn to th fish study: Ar th infrnc rsults th significanc tst and confidnc intrval that w calculatd trustworthy? Lt s chck to s if Conditions 1 4 ar rasonably satisfid. A rsidual plot appars in Figur 30.11. 0.2 Rsidual Plot (Rspons is Mrcury Concntration)) 0.1 Rsidual 0.0-0.1-0.2 300 350 400 Total Lngth (mm) 450 500 Figur 30.11. Rsidual plot for chcking conditions. Unit 30: Infrnc for Rgrssion Studnt Guid Pag 17

Th dots appar randomly scattrd and split abov and blow th horizontal axis. In addition, th vrtical sprad sms to b roughly th sam as total lngth, x, incrass. Thrfor, Conditions 1 and 4 ar rasonably satisfid. Figur 30.12 shows a normal quantil plot of th rsiduals. Th pattrn of th dots appars fairly linar. So, Condition 2 is rasonably satisfid. 99 Normal Quantil Plot 95 90 Prcnt 80 70 60 50 40 30 20 10 5 1-0.3-0.2-0.1 0.0 Rsiduals 0.1 0.2 0.3 Figur 30.12. Normal quantil plot of rsiduals. Finally, th data wr a random sampl of fish. So, th mrcury concntration lvls ar indpndnt of ach othr. Condition 3 is satisfid. So, now w can say that our infrnc rsults ar trustworthy. Unit 30: Infrnc for Rgrssion Studnt Guid Pag 18

Ky Trms Th simpl linar rgrssion modl assums that for ach valu of x th obsrvd valus of th rspons variabl y ar normally distributd about a man µ y that has th following linar rlationship with x: µ y = α + βx Th lin dscribd by µ y = α + βx is calld th population rgrssion lin. Th stimatd rgrssion lin for th linar rgrssion modl is th last-squars lin, ŷ = a + bx. Assumptions of th linar rgrssion modl: Th obsrvd rspons y for any valu of x varis according to a normal distribution. Th y-valus ar indpndnt of ach othr. Th man rspons, µ y, has a straight-lin rlationship with x: µ y = α + βx. Th standard dviation of y, σ, is th sam for all valus of x. Th standard rror of th stimat, s, is a masur of how much th obsrvations vary about th last-squars lin. It is a point stimat for σ and is computd from th following formula: s 2 ( y yˆ ) SSE = = n 2 n 2 Th standard rror of th slop, s b, is th stimatd standard dviation of b, th lastsquars stimat for th population slop β. It is calculatd from th following formula: s b = s ( x x) 2 Th t-tst statistic for tsting H 0 : β = β 0, whr β is th population slop, is calculatd as follows: t = b β 0 s b Unit 30: Infrnc for Rgrssion Studnt Guid Pag 19

whr b is th last-squars stimat of th population slop, β 0 is th null hypothsis valu for β, and s b is th standard rror of b. Whn H 0 is tru, t has a t-distribution with df = n 2, whr n is th numbr of (x,y)-pairs in th sampl. Th usual null hypothsis is H 0 : β = 0, which says that th straight-lin dpndnc on x has no valu in prdicting y. To calculat a confidnc intrval for th population slop, β, us th following formula: b± t * s b whr t* is a t-critical valu associatd with th confidnc lvl and dtrmind from a t-distribution with df = n 2; b is th last-squars stimat of th population slop, and sb th standard rror of b. is Unit 30: Infrnc for Rgrssion Studnt Guid Pag 20

Th Vido Tak out a pic of papr and b rady to writ down th answrs to ths qustions as you watch th vido. 1. Th population of prgrin falcons was in dclin in th 1950s. What was th rason for th population s dclin? 2. In a scattrplot of ggshll thicknss and log-concntration of DDE, which was th xplanatory variabl and which was th rspons variabl? 3. Dscrib th form of th rlationship btwn ggshll thicknss and log-concntration of DDE is th form linar or nonlinar? Positiv or ngativ? 4. What is a population rgrssion lin? 5. Why ar a and b, th y-intrcpt and slop of th last-squars lin, calld statistics? 6. Stat th null and altrnativ hypothss usd for tsting whthr th sampl data providd strong vidnc that highr lvls of DDE wr rlatd to ggshll thinning in th population. Unit 30: Infrnc for Rgrssion Studnt Guid Pag 21

7. What was th outcom of th significanc tst? 8. Did th prgrin falcons vr rcovr? Unit 30: Infrnc for Rgrssion Studnt Guid Pag 22

Unit Activity: Clus to th Thif A high school s mascot is stoln and th postr shown in Figur 30.13 has bn postd around th school and th town. Th thif has lft clus: a plain black swatr and a st of footprints undr a window. Th footprints appar to hav bn mad by a man s snakr. Hr ar mor dtails from th invstigation: Th distanc btwn th footprints rvals that th thif s stps ar about 58 cm long. This distanc was masurd from th back of th hl on th first footprint to th back of th hl on th scond. Th thif s forarm is btwn 26 and 27 cm. Th forarm lngth was stimatd from th swatr by masuring from th cntr of a worn spot on th lbow to th turn at th cuff. Figur 30.13. Th missing manat. School officials suspct that th thif is a studnt from a rival high school. Tabl 30.2 contains data from a random sampl of 9 th and 10 th -grad studnts that you can us for this activity. Fl fr to add and/or substitut data that your class collcts. In this activity, you will fit two linar rgrssion modls to th data. For th first modl you will fit a lin to forarm lngth and hight; for th scond modl, you will fit a lin to stp lngth and hight. To liminat confusion, xprss your modls using th variabl nams rathr than x and y. Unit 30: Infrnc for Rgrssion Studnt Guid Pag 23

1. a. Mak a scattrplot of hight vrsus forarm lngth. Calculat th quation of th lastsquars lin and add its graph to your scattrplot. b. Chck to s if th four conditions for th simpl linar rgrssion modl ar rasonably satisfid. (Look to s if thr ar strong dparturs from th conditions.) c. Calculat th standard rror of th stimat, s. 2. Nxt, lt s focus on infrnc rlatd to th rlationship btwn hight and forarm lngth. a. W xpct popl with longr forarms to b tallr than popl with shortr forarms. Conduct a significanc tst H 0 : β = 0 against H a : β > 0. Rport th valu of th tst statistic, th dgrs of frdom, th p-valu, and your conclusion. b. Construct a 95% confidnc intrval for β. Intrprt your confidnc intrval in th contxt of this situation. 3. a. Mak a scattrplot of hight vrsus stp lngth. Calculat th quation of th lastsquars lin and add its graph to your scattrplot. b. Chck to s if th four conditions for th simpl linar rgrssion modl ar rasonably satisfid. (Look to s if thr ar strong dparturs from th conditions.) c. Calculat th standard rror of th stimat, s. 4. Nxt, w focus on infrnc rlatd to th rlationship btwn hight and stp lngth. a. W xpct popl with longr stp lngths to b tallr than popl with shortr stp lngths. Conduct a significanc tst H 0 : β = 0 against H a : β > 0. Rport th valu of th tst statistic, th dgrs of frdom, th p-valu, and your conclusion. b. Construct a 95% confidnc intrval for β. Intrprt your confidnc intrval in th contxt of this situation. 5. a. You hav two compting modls for prdicting hight, on basd on forarm lngth and th othr basd on stp lngth. Which of your two modls is likly to produc mor prcis stimats? Explain. Unit 30: Infrnc for Rgrssion Studnt Guid Pag 24

b. Us on or both of your modls to fill in th blanks in th following sntnc. Justify your answr. W prdict that th thif is cm tall. But th thif might b as short as or as tall as. Gndr Hight Strid Lngth Forarm Lngth (cm) (cm) (cm) Mal 166.0 58.250 28.5 Mal 178.0 68.500 29.0 Mal 171.0 58.500 27.2 Mal 165.0 50.125 28.0 Mal 177.5 58.750 31.3 Mal 166.0 62.875 28.3 Mal 175.5 59.125 28.6 Mal 171.0 67.750 31.5 Mal 184.0 68.875 30.5 Mal 184.5 66.250 30.8 Mal 183.5 79.500 30.5 Mal 172.0 70.500 30.3 Fmal 164.5 55.875 24.2 Fmal 166.0 52.375 27.3 Fmal 168.0 55.375 28.0 Fmal 178.5 59.750 29.1 Fmal 166.0 48.375 27.9 Fmal 159.0 57.125 28.0 Fmal 166.0 64.000 27.4 Fmal 154.5 57.750 25.8 Fmal 161.0 63.500 27.0 Fmal 177.0 69.750 30.1 Fmal 161.0 72.500 26.5 Fmal 164.0 75.250 28.2 Fmal 174.0 58.500 28.4 Fmal 164.0 59.750 26.8 Fmal 168.0 55.250 26.4 Tabl 30.2. Data from 9 th and 10-grad studnts. Unit 30: Infrnc for Rgrssion Studnt Guid Pag 25

Exrciss Tabl 30.3 provids data on fmur (thighbon) and ulna (forarm bon) lngths and hight. Ths data ar a random sampl takn from th Fornsic Anthropology Data Bank (FDB) at th Univrsity of Tnnss. Notic that hight is givn in cntimtrs and bon lngth in millimtrs. All xrciss will b basd on ths data. Fmur Lngth, x 1 Ulna Lngth, x 2 Hight, y (mm) (mm) (cm) 432 237 158 498 288 188 463 276 173 443 245 163 511 278 191 547 283 189 484 279 178 522 293 182 438 251 163 462 262 175 449 255 159 499 273 181 484 280 168 472 255 175 484 269 175 432 248 160 439 248 165 483 263 170 484 269 180 508 307 183 Tabl 30.3. Data on fmur and ulna lngth and hight. 1. a. Mak a scattrplot of hight vrsus fmur lngth. Would you dscrib th pattrn of th dots as linar or nonlinar? Positiv association or ngativ? b. Calculat th quation of th last-squars lin. Add a graph of th lin to your scattrplot in (a). c. Chck to s if th conditions for rgrssion infrnc ar rasonably satisfid. Idntify any strong dparturs from th conditions. Unit 30: Infrnc for Rgrssion Studnt Guid Pag 26

2. a. Building on th work don for qustion 1, calculat th standard rror of th stimat, s. b. Writ th quations of rror bands on and two standard rrors, s, abov and blow th last-squars lin. Add graphs of ths lins to your scattrplot from qustion 1(b). c. If th distributions of th rsponss, y-valus, for any fixd x ar normally distributd with man on th rgrssion lin, thn th outrmost bands in (b) should trap roughly 95% of th data btwn th bands. Is that th cas? 3. a. Mak a scattrplot of hight vrsus ulna lngth. Dtrmin th quation of th lastsquars lin and add a graph of th last-squars lin to your scattrplot. b. Calculat th standard rror of th stimat, s. c. Suppos a partial sklton is found on a ruggd hillsid. Th sklton is brought to a lab for idntification. Th ulna bon masurs 287 mm and th fmur masurs 520 mm. Us your quation from 3(a) to prdict th prson s hight. Thn us your quation from 1(b) to prdict th prson s hight. Which of your stimats, th on basd on ulna lngth or th on basd on fmur lngth, is likly to b mor rliabl? Justify your answr basd on th standard rror of th stimat, s, for ach quation. 4. Considr th linar rgrssion modl for hight basd on fmur lngth. a. Tst th hypothsis H 0 : β = 0 against th on-sidd altrnativ H a : β > 0. Rport th valu of th t-tst statistic, th dgrs of frdom, th p-valu, and your conclusion. b. Calculat a 95% confidnc intrval for th population slop, β. Unit 30: Infrnc for Rgrssion Studnt Guid Pag 27

Rviw Qustions 1. Th vido focusd on prgrin falcons and th rlationship btwn ggshll thicknss and log-concntration of DDE. During th vido, w did not chck whthr or not th conditions for infrnc wr mt and wnt ahad with conducting a significanc tst and constructing a confidnc intrval. Your task is to chck whthr th four conditions for infrnc ar rasonably satisfid givn th following information. Justify your answr. Assum that th data cam from a random sampl of ggs collctd from Alaska and Northrn Canada. Figur 30.14 shows a rsidual plot and Figur 30.15 displays a normal quantil plot of th rsiduals. 0.3 Rsidual Plot 0.2 Rsiduals 0.1 0.0-0.1-0.2-0.3 1.0 1.5 2.0 Log-Concntration DDE 2.5 Figur 30.14. Rsidual plot. 99.9 Normal Quantil Plot 99 Prcnt 95 90 80 70 60 50 40 30 20 10 5 1 0.1-0.5-0.4-0.3-0.2-0.1 0.0 Rsiduals 0.1 0.2 0.3 0.4 Figur 30.15. Normal Quantil Plot of Rsiduals. Unit 30: Infrnc for Rgrssion Studnt Guid Pag 28

2. Admissions offics of collgs and univrsitis ar intrstd in any information that can hlp thm dtrmin which studnts will b succssful at thir institution. For xampl, could studnts high school grad point avrags (GPA) b usful in prdicting thir first-yar collg GPAs? Data on high school GPA and first-yar collg GPA from a random sampl of 32 collg studnts attnding a stat univrsity is displayd in Tabl 30.4. High School GPA First Yar Collg GPA High School GPA First Yar Collg GPA 3.00 3.15 2.90 1.46 3.00 2.07 3.50 3.10 2.30 2.60 3.10 2.76 3.68 4.00 3.35 2.01 2.20 2.03 3.70 3.34 3.00 3.53 2.70 2.90 3.03 3.17 2.86 2.93 3.00 2.68 2.51 1.95 3.16 3.88 2.93 3.01 2.70 2.30 3.41 3.48 4.00 3.64 3.30 2.87 3.77 3.62 3.76 2.85 2.70 2.34 2.66 1.67 3.10 3.64 2.91 3.38 3.23 3.67 3.47 3.68 2.80 3.37 3.40 3.76 Tabl 30.4. Data on high school GPA and first-yar collg GPA. a. Mak a scattrplot of first-yar collg GPA vrsus high school GPA. Dos th form of ths data appar to b linar? Would you dscrib th rlationship as positiv or ngativ? b. Dtrmin th quation of th last-squars lin and add th lin to your scattrplot in (a). c. Dtrmin th t-tst statistic for tsting H 0 : β = 0. How many dgrs of frdom dos t hav? d. Find th p-valu for th on-sidd altrnativ H a : β > 0. What do you conclud? 3. Linda hats hr hous with natural gas. Sh wondrs how hr gas usag is rlatd to how cold th wathr is. Tabl 30.5 shows th avrag tmpratur (in dgrs Fahrnhit) ach month from Sptmbr through May and th avrag amount of natural gas Linda s hous usd (in hundrds of cubic ft) ach day that month. Unit 30: Infrnc for Rgrssion Studnt Guid Pag 29

Month Sp Oct Nov Dc Jan Fb Mar Apr May Outdoor tmpratur F 48 46 38 29 26 28 49 57 65 Gas usd pr day (100 cu ft) 5.1 4.9 6 8.9 8.8 8.5 4.4 2.5 1.1 Tabl 30.5. Gas usag and tmpratur data. a. Mak a scattrplot of gas usag vrsus tmpratur. Dscrib th form and dirction of th rlationship btwn ths two variabls. b. Fit a last-squars lin to gas usag vrsus tmpratur and add a graph of th lin to your scattrplot in (a). c. Chck to s if th conditions ndd for infrnc ar satisfid. d. Calculat th standard rror of th stimat, s, and standard rror of th slop, s b. Show your calculations.. Conduct a significanc tst of H o : β = 0. Should th altrnativ b on-sidd or twosidd? Rport th valu of th t-tst statistic, th dgrs of frdom, th p-valu and your conclusion. f. Calculat a 95% confidnc intrval for th population slop. Intrprt your rsults in th contxt of this problm. 4. Do tallr 4-yar-olds tnd to bcom tallr 6-yar-olds? Can a linar rgrssion modl b usd to prdict a 4-yar-old s hight whn h or sh turns six? Tabl 30.6 givs data on hights of childrn whn thy wr four and thn whn thy wr six. Hight Ag 4 Hight Ag 6 Hight Ag 4 Hight Ag 6 104.4 118.4 98.1 112.8 104.0 119.4 100.6 115.2 92.1 103.9 100.5 115.8 103.3 116.8 102.7 117.3 98.4 113.1 98.5 113.3 96.5 110.0 98.8 109.3 105.3 119.3 102.3 117.9 103.2 118.6 99.0 112.2 105.9 123.2 100.2 112.9 97.4 110.2 100.3 113.4 103.4 118.7 99.6 112.6 101.7 119.2 109.8 124.5 105.4 120.2 100.2 113.7 104.4 119.2 99.6 115.2 100.7 112.6 104.1 117.1 Tabl 30.6. Data on childrn s hights at ag 4 and 6. Unit 30: Infrnc for Rgrssion Studnt Guid Pag 30

a. Mak a scattrplot of hight at ag 6 vrsus hight at ag 4. Dtrmin th quation of th last squars lin and add its graph to th scattrplot. b. From rgrssion output w gt s = 1.38596 and s b = 0.07437. Construct a 95% confidnc intrval for th population slop β. Intrprt your confidnc intrval in th contxt of childrn s growth. Unit 30: Infrnc for Rgrssion Studnt Guid Pag 31