STAT 111 Recitation 7

Similar documents
STA 302f16 Assignment Five 1

Chapter 12 - Lecture 2 Inferences about regression coefficient

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Scatter plot of data from the study. Linear Regression

Extrema of Functions of Several Variables

Scatter plot of data from the study. Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Ch 2: Simple Linear Regression

Regression Models - Introduction

Measuring the fit of the model - SSR

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

STA 2201/442 Assignment 2

where x and ȳ are the sample means of x 1,, x n

Linear Regression Spring 2014

Introduction to Linear Regression

Final exam (practice) UCLA: Math 31B, Spring 2017

Regression Models - Introduction

Lecture 14 Simple Linear Regression

STAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Simple Linear Regression

Lecture 15. Hypothesis testing in the linear model

Fitting a regression model

STA 2101/442 Assignment 3 1

SOLUTIONS FOR PROBLEMS 1-30

Intro to Linear Regression

Midterm 2 - Solutions

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Concordia University (5+5)Q 1.

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Correlation and Regression

Preliminaries Lectures. Dr. Abdulla Eid. Department of Mathematics MATHS 101: Calculus I

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

4 Bias-Variance for Ridge Regression (24 points)

Simple and Multiple Linear Regression

Integer-Valued Polynomials

STAT 361 Fall Homework 1: Solution. It is customary to use a special sign Σ as an abbreviation for the sum of real numbers

Business Statistics. Lecture 9: Simple Regression

Updated: January 16, 2016 Calculus II 7.4. Math 230. Calculus II. Brian Veitch Fall 2015 Northern Illinois University

Homework 2: Simple Linear Regression

Solutions for Quadratic Equations and Applications

Final Exam - Solutions

Simple Linear Regression for the Climate Data

AMS 7 Correlation and Regression Lecture 8

Announcements Monday, September 25

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Simple Linear Regression Analysis

Lecture 8 CORRELATION AND LINEAR REGRESSION

Week 3: Simple Linear Regression

Linear Models Review

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Mathematics Revision Guide. Algebra. Grade C B

Simple Linear Regression

Math 1320, Section 10 Quiz IV Solutions 20 Points

Probability. Hosung Sohn

Math 308 Midterm Answers and Comments July 18, Part A. Short answer questions

Basic Probability Reference Sheet

[A + 1 ] + (1 ) v: : (b) Show: the derivative of T at v = v 0 < 0 is: = (v 0 ) (1 ) ; [A + 1 ]

STA 2101/442 Assignment Four 1

Math 216 First Midterm 18 October, 2018

Midterm 2 - Solutions

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

LIMITS AT INFINITY MR. VELAZQUEZ AP CALCULUS

TMA4255 Applied Statistics V2016 (5)

Multivariate Regression (Chapter 10)

UNIT 12 ~ More About Regression

Two-Sample Inference for Proportions and Inference for Linear Regression

10701/15781 Machine Learning, Spring 2007: Homework 2

22 Approximations - the method of least squares (1)

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

INFERENCE FOR REGRESSION

Leplace s Equations. Analyzing the Analyticity of Analytic Analysis DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING. Engineering Math EECE

Machine Learning (CS 567) Lecture 5

Lecture 11: Simple Linear Regression

MA 0090 Section 21 - Slope-Intercept Wednesday, October 31, Objectives: Review the slope of the graph of an equation in slope-intercept form.

Math Maximum and Minimum Values, I

Complex Variables. Chapter 2. Analytic Functions Section Harmonic Functions Proofs of Theorems. March 19, 2017

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Week Topics of study Home/Independent Learning Assessment (If in addition to homework) 7 th September 2015

Psychology 282 Lecture #3 Outline

11.5 Regression Linear Relationships

Correlation 1. December 4, HMS, 2017, v1.1

Dr. Allen Back. Oct. 6, 2014

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Linear Regression & Correlation

Introduction to Econometrics Midterm Examination Fall 2005 Answer Key

Plan for Beginning of Year 2: Summer assignment (summative) Cumulative Test Topics 1-4 (IB questions only/no retakes) IA!!

Tuesday, Feb 12. These slides will cover the following. [cos(x)] = sin(x) 1 d. 2 higher-order derivatives. 3 tangent line problems

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

STAT 458 Lab 4 Linear Regression Analysis

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Intro to Linear Regression

Stat 20 Midterm 1 Review

Applied Regression Analysis

Statistics for Engineers Lecture 9 Linear Regression

Announcements September 19

Transcription:

STAT 111 Recitation 7 Xin Lu Tan xtan@wharton.upenn.edu October 25, 2013 1 / 13

Miscellaneous Please turn in homework 6. Please pick up homework 7 and the graded homework 5. Please check your grade and let me know during the next recitation if there are any grade discrepancies (please show me your graded homework as well). Note: Homework 7 is printed two-sided. 2 / 13

Midterm: Z-chart The following Z chart gives less than probabilities for positive values of z, i.e. P(Z z) for Z N(0, 1). What is P(Z 2), P(Z 2), P(Z 2), P(Z 2)? 3 / 13

Midterm: Problem 5 Let X 1 be a random variable having a normal distribution with mean 12 and variance 9. Also, Let X 2 be a random variable having a normal distribution with mean 14 and variance 16. Calculate the probability that X 2 > X 1. [Hint: Think of the difference X 1 X 2.] Recall: If X N(µ, σ 2 ), then ax N(aµ, a 2 σ 2 ). If X 1 N(µ 1, σ1 2), X 2 N(µ 2, σ2 2) and X 1, X 2 are independent, then X 1 + X 2 N(µ 1 + µ 2, σ1 2 + σ2 2 ). In particular, we have X 1 X 2 N(µ 1 µ 2, σ1 2 + σ2 2 ). 4 / 13

Simple Linear Regression Suppose you observe n data points (x i, y i ), i = 1, 2,..., n. y -20-10 0 10 20 30 40-5 0 5 10 15 20 25 x It seems like there is some kind of linear relationship between the random variables X i and Y i, i = 1, 2,..., n, i.e. Y i = α + βx i + ɛ i where ɛ i denotes the noise term (we assume that each y i is observed with noise ɛ i that has mean 0 and variance σ 2 ). 5 / 13

Simple Linear Regression The goal of simple linear regression is to find the straight line y = a + bx that would provide the best fit for the data points. Questions: How do we define best? 6 / 13

Simple Linear Regression Intuitively, we would expect that a good line to stay close to the data points. The best line is then the one that stays closest to the data points. How do we measure closeness? Intuitively, we want the distance between each individual data points and the line to be small. y -20-10 0 10 20 30 40-5 0 5 10 15 20 25 x 7 / 13

Simple Linear Regression This leads us to considering the line y = a + bx that has minimum sum of absolute value of residuals y i a bx i So all we need to do in order to estimate the parameters α and β from the data is to find the line y = a + bx that minimize this quantity. The a and b found this way then serve as an estimate for α and β. 8 / 13

Simple Linear Regression This leads us to considering the line y = a + bx that has minimum sum of absolute value of residuals y i a bx i So all we need to do in order to estimate the parameters α and β from the data is to find the line y = a + bx that minimize this quantity. The a and b found this way then serve as an estimate for α and β. Note: We SHOULD NOT consider minimizing the term (y i a bx i ) since the individual terms can be positive or negative and might cancel out each other. 8 / 13

The Least-Squares Approach But we don t like to deal with absolute values since it can be a little inconvenient for us to compute the values of a and b that minimizes y i a bx i. 9 / 13

The Least-Squares Approach But we don t like to deal with absolute values since it can be a little inconvenient for us to compute the values of a and b that minimizes y i a bx i. So instead, we consider minimizing the sum of squared residuals (y i a bx i ) 2, where the values of a and b that minimizes this quantity can be computed easily using calculus. The a and b found this way then serve as an estimate for α and β. 9 / 13

Estimation of α and β It can be shown (by taking partial derivative of the sum of squared residuals) that the best (in the least squares sense) straight line fit y = ax + b has b = (x i x)(y i ȳ) (x i x) 2 = x iy i n xȳ x 2 i n x 2, a = ȳ b x. If we denote s xx = (x i x) 2, s xy = (x i x)(y i ȳ), s yy = then the formula for b can be rewritten as We also estimate σ 2 by b = s xy s xx. (y i ȳ) 2, s 2 r = s yy b 2 s xx n 2 10 / 13

Back to the previous graph Back to our previous graph, do you see that the least square line is in overall much closer than the other line to most data points? y -20-10 0 10 20 30 40-5 0 5 10 15 20 25 x In fact, the fitted green line y = a + b x obtained using the formula in previous slide has the minimum (y i a bx i ) 2 among all straight lines of the form y = a + bx!! 11 / 13

Estimates and Estimators Our estimate a and b of the parameter α and β is of the form b = (x i x)(y i ȳ) (x i x) 2, a = ȳ b x i.e. it is a function of our data (x i, y i ), i = 1, 2,..., n. But (x i, y i ), i = 1, 2,..., n are themselves the realized values of the random variables (X i, Y i ), i = 1, 2,..., n, so our estimator a and b of the parameter α and β is of the form b = (X i X )(Y i Ȳ ) (X i X ) 2, a = Ȳ b X, which are themselves random variables. We can therefore construct confidence interval for a and b, to give us an idea of the precision of our estimates! 12 / 13

Confidence Interval of b It can be shown (the math is too difficult to give here) that a is an unbiased estimate of α, that b is an unbiased estimate of β, and that that s 2 r is an unbiased estimate of σ 2. We may want to ask: how accurate is the estimate b of β? An approximate 95% confidence interval for β is given by b 2s r sxx to b + 2s r sxx 13 / 13