Bootstrap and Linear Regression Spring You should have downloaded studio12.zip and unzipped it into your working directory.

Similar documents
Studio 8: NHST: t-tests and Rejection Regions Spring 2014

Bootstrapping Spring 2014

Confidence Intervals for Normal Data Spring 2014

Post-exam 2 practice questions 18.05, Spring 2014

Comparison of Bayesian and Frequentist Inference

Confidence Intervals for the Mean of Non-normal Data Class 23, Jeremy Orloff and Jonathan Bloom

Linear Regression Spring 2014

Continuous Expectation and Variance, the Law of Large Numbers, and the Central Limit Theorem Spring 2014

Compute f(x θ)f(θ) dθ

Exam 2 Practice Questions, 18.05, Spring 2014

Bayesian Updating: Discrete Priors: Spring 2014

Lecture 9. Expectations of discrete random variables

Conjugate Priors: Beta and Normal Spring January 1, /15

Section 7.1 How Likely are the Possible Values of a Statistic? The Sampling Distribution of the Proportion

Continuous Data with Continuous Priors Class 14, Jeremy Orloff and Jonathan Bloom

Recitation 1: Regression Review. Christina Patterson

8.01x Classical Mechanics, Fall 2016 Massachusetts Institute of Technology. Problem Set 2

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

Null Hypothesis Significance Testing p-values, significance level, power, t-tests

5.61 Lecture #17 Rigid Rotor I

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.

18.S096 Problem Set 7 Fall 2013 Factor Models Due Date: 11/14/2013. [ ] variance: E[X] =, and Cov[X] = Σ = =

Diagnostics. Gad Kimmel

Variance Decomposition

Time Series. Karanveer Mohan Keegan Go Stephen Boyd. EE103 Stanford University. November 15, 2016

8.01x Classical Mechanics, Fall 2016 Massachusetts Institute of Technology. Problem Set 8

Conditional Probability, Independence, Bayes Theorem Spring January 1, / 28

The Relationship between the US Economy s Information Processing and Absorption Ratio s

P 3. Figure 8.39 Constrained pulley system. , y 2. and y 3. . Introduce a coordinate function y P

Choosing Priors Probability Intervals Spring January 1, /25

14.6 Spring Force Energy Diagram

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

Class 26: review for final exam 18.05, Spring 2014

Linear regression Class 25, Jeremy Orloff and Jonathan Bloom

with increased Lecture Summary #33 Wednesday, December 3, 2014

Lecture 30. DATA 8 Summer Regression Inference

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

Bivariate Regression Analysis. The most useful means of discerning causality and significance of variables

Estimating Covariance Using Factorial Hidden Markov Models

ISQS 5349 Final Exam, Spring 2017.

Financial Econometrics

Big Data Analysis with Apache Spark UC#BERKELEY

05 Regression with time lags: Autoregressive Distributed Lag Models. Andrius Buteikis,

Fin285a:Computer Simulations and Risk Assessment Section 2.3.2:Hypothesis testing, and Confidence Intervals

Identifying Financial Risk Factors

BOOtstrapping the Generalized Linear Model. Link to the last RSS article here:factor Analysis with Binary items: A quick review with examples. -- Ed.

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

Outline. Confidence intervals More parametric tests More bootstrap and randomization tests. Cohen Empirical Methods CS650

Ch3. TRENDS. Time Series Analysis

MIT Spring 2016

GMM Estimation in Stata

Simple linear regression

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

Choosing Priors Probability Intervals Spring 2014

14.4 Change in Potential Energy and Zero Point for Potential Energy

14.9 Worked Examples. Example 14.2 Escape Velocity of Toro

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Econ671 Factor Models: Principal Components

14.452: Introduction to Economic Growth Problem Set 4

Experiment 1: Linear Regression

Lund Institute of Technology Centre for Mathematical Sciences Mathematical Statistics

1. (24 points) 5. (8 points) 6d. (13 points) 7. (6 points)

Simple Harmonic Oscillator Challenge Problems

Principal components

Conditional Probability, Independence, Bayes Theorem Spring January 1, / 23

Non-Linear Regression Samuel L. Baker

Supplemental Resource: Brain and Cognitive Sciences Statistics & Visualization for Data Analysis & Inference January (IAP) 2009

Practice Exam A Answer Key for Fall 2014

Recitation #4: Solving Endogenous Growth Models

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

6.207/14.15: Networks Lecture 12: Generalized Random Graphs

The linear model. Our models so far are linear. Change in Y due to change in X? See plots for: o age vs. ahe o carats vs.

STATISTICS 110/201 PRACTICE FINAL EXAM

Massachusetts Institute of Technology Physics 8.03SC Fall 2016 Homework 9

8.01x Classical Mechanics, Fall 2016 Massachusetts Institute of Technology. Problem Set 1

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

On Improving the Output of. a Statistical Model

Study Unit 2 : Linear functions Chapter 2 : Sections and 2.6

Multiple comparison procedures

Time Series Models for Measuring Market Risk

Practice Exam A for Fall 2014 Exam 1

Lecture (2) Today: Generalized Coordinates Principle of Least Action. For tomorrow 1. read LL 1-5 (only a few pages!) 2. do pset problems 4-6

Introduction to Algorithmic Trading Strategies Lecture 10

18.05 Exam 1. Table of normal probabilities: The last page of the exam contains a table of standard normal cdf values.

14.30 Introduction to Statistical Methods in Economics Spring 2009

Lecture 18: Simple Linear Regression

5601 Notes: The Subsampling Bootstrap

Stat 5102 Final Exam May 14, 2015

The Multivariate Gaussian Distribution

Exam 1 Practice Questions I solutions, 18.05, Spring 2014

10/8/2014. The Multivariate Gaussian Distribution. Time-Series Plot of Tetrode Recordings. Recordings. Tetrode

Time series and Forecasting

Data Science for Engineers Department of Computer Science and Engineering Indian Institute of Technology, Madras

MIT Spring 2015

Bayesian Updating with Continuous Priors Class 13, Jeremy Orloff and Jonathan Bloom

A comparison of four different block bootstrap methods

Probabilities & Statistics Revision

Chapter 9. Bootstrap Confidence Intervals. William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University

Topic 14: Inference in Multiple Regression

Transcription:

Bootstrap and Linear Regression 18.05 Spring 2014 You should have downloaded studio12.zip and unzipped it into your 18.05 working directory.

January 2, 2017 2 / 14 Review: Computing a bootstrap confidence interval Starting with data x 1, x 2,... x n and test statistic θˆ. 1 2 Generate a resample of the data of size n. Compute the test statistic θ. 3 Compute and store the bootstrap difference θ θˆ. 4 5 Repeat steps 1-3 nboot times. The bootstrap confidence interval is [ ] θˆ δ θˆ δ 1 α/2, α/2 where δ α/2 is a quantile.

January 2, 2017 3 / 14 Board question: two independent samples Suppose x 1, x 2,..., x n and y 1, y 2,..., y m are independent samples drawn from distributions with means µ x and µ y respectively. Describe in detail the steps for computing an empirical bootstrap 95% confidence interval for µ x µ y

January 2, 2017 4 / 14 Solution The steps are almost identical to the ones outlined above. The only difference is that you have to generate two independent bootstrap resamples at each step: The test statistic is θˆ = x y. 1 Resample: 2 3 4 5 x 1, x 2,..., x n m y 1, y 2,..., y resampled from x 1,..., x n resampled from y 1,..., y m Compute θ = x y Compute and store the bootstrap difference θ θˆ Repeat steps 1-3 nboot times. The bootstrap confidence interval is [ ] θˆ δ θˆ δ 1 α/2, α/2 where δ α/2 is a quantile.

January 2, 2017 5 / 14 R Problem 1: Bootstrapping The data file salaries.csv contains two columns of data: Salaries.1 and Salaries.2. Using R compute a 95% bootstrap confidence interval for the difference of the two means. The file studio12.r has code that will show you how load the data in salaries.csv studio12.r also has sample code for computing a one-sample bootstrap confidence interval. answer: Code for the solution is in studio12-sol.r

January 2, 2017 6 / 14 Linear regression using R We will use R to analyze financial data using simple linear regression. We ll use publically available data on the price of several stocks over a 14 year period from 2000 to 2014. We ll use a simplified version of the Capital Asset Pricing Model or CAPM. The file studio12.r contains code for loading the data and fitting a line to two of the variables.

January 2, 2017 7 / 14 Exploring the original data The original data is in the file studio12financialoriginal.csv The next two sides show a quick exporation of this data. The file contains daily prices or interest rates, over 14 years, for several stocks, bonds and a barrel of oil. We will focus on: SP500: Standard & Poors stock index, the cost of a certain basket of stocks. GE: General Electric. DCOICWTICO: the price of a barrel of the benchmark West Texas crude oil.

January 2, 2017 8 / 14 Price plots The prices are scaled so they would all fit nicely on the same axes. For example, by plotting 40*GE vs. date it is at roughly the same scale as the others. (Scaled) Stock Prices over 14 years scaled price 0 500 1000 1500 SP500 40*GE 10*DCOILWTICO 2000 2005 2010 date

January 2, 2017 9 / 14 Daily rate of return For this project we ll look at the daily rate of return for the financial variables. Our goal is to fit the data with linear models of the form DCOICWTICO.daily = a + b*sp500.daily The daily rate of return was precomputed using studio12-prep.r The data is stored in studio12financialdaily.csv The R file studio12.r has code to load this data and fit a linear model Let s look at studio12.r now!

Fitting a linear model using R Linear model: DCOICWTICO.daily = a + b*sp500.daily R code: lmfit = lm(dcoilwtico.daily SP500.daily) summary(lmfit): Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 0.0003469 0.0003546 0.978 0.328 SP500.daily 0.3325257 0.0311009 10.692 <2e-16 *** This estimates the model coefficients as ^a = 0.0003469 and b ^ = 0.3325257. Pr(> t ) are p-values for a NHST with H 0 that the given coefficient is 0. ^b is significantly different from 0, with p-value < 2 10 16. ^a (the constant term) is not significantly different from 0. January 2, 2017 10 / 14

Linear fit January 2, 2017 11 / 14

January 2, 2017 12 / 14 R Problem 2: Linear regression 1 2 3 4 Load the data from studio12financialdaily.csv Do a linear fit of GE.daily vs. SP500.daily Interpret the results. Run the multiple linear regression at the end of studio12.r and interpret the results. answer: The answers to 1 and 2 are in studio12-sol.r 3. The coefficient of SP500.daily is positive and significantly different from 0. This indicates that SP500.daily and GE.daily are positively correlated. They tend to rise and fall together. Continued on next slide.

January 2, 2017 13 / 14 Solution continued 4. The coefficients of SP500.daily and DCOILWTICO.daily are both significantly different from 0. Interestingly the coefficient of DCOILWTICO.daily is negative. When we compared GE and DCOILWTICO without including SP500 the coefficient of DCOILWTICO was positive. So each pair of GE, SP500 and DCOILWTICO are positively correlated: they all tend to rise and fall together. With both SP500.daily and COILWTICO.daily as predictor variables for GE, we see that if the SP500 is flat, then a rise in the price of oil predicts a fall in the price of GE. One possible explanation: higher energy costs that aren t rooted in broader economic growth raise GE s costs without increasing its profits. Note: The lm function returns a lot of other information bearing on the predictive power of the predictor variables.

MIT OpenCourseWare https://ocw.mit.edu 18.05 Introduction to Probability and Statistics Spring 2014 For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.