Class: Taylor. January 12, Story time: Dan Willingham, the Cog Psyc. Willingham: Professor of cognitive psychology at Harvard

Similar documents
Descriptive Statistics (And a little bit on rounding and significant digits)

Practice 2 due today. Assignment from Berndt due Monday. If you double the number of programmers the amount of time it takes doubles. Huh?

22s:152 Applied Linear Regression

Math 2311 Written Homework 6 (Sections )

Generating Function Notes , Fall 2005, Prof. Peter Shor

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

The linear model. Our models so far are linear. Change in Y due to change in X? See plots for: o age vs. ahe o carats vs.

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2:

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

Lecture 18: Simple Linear Regression

Please bring the task to your first physics lesson and hand it to the teacher.

Introduction to Linear Regression

STAT 350: Summer Semester Midterm 1: Solutions

Talk Science Professional Development

Modern Physics notes Paul Fendley Lecture 1

Chapter 12: Linear regression II

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Lecture 1 Intro to Spatial and Temporal Data

HAMPSHIRE COLLEGE: YOU CAN T GET THERE FROM HERE : WHY YOU CAN T TRISECT AN ANGLE, DOUBLE THE CUBE, OR SQUARE THE CIRCLE. Contents. 1.

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 9: June 6, Abstract. Regression and R.

Solving Quadratic & Higher Degree Equations

ST430 Exam 1 with Answers

LECTURE 15: SIMPLE LINEAR REGRESSION I

2. l = 7 ft w = 4 ft h = 2.8 ft V = Find the Area of a trapezoid when the bases and height are given. Formula is A = B = 21 b = 11 h = 3 A=

Class: Dean Foster. September 30, Read sections: Examples chapter (chapter 3) Question today: Do prices go up faster than they go down?

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester

MATH 408N PRACTICE MIDTERM 1

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Solving Quadratic & Higher Degree Equations

MATH 644: Regression Analysis Methods

Introduction to Mixed Models in R

22s:152 Applied Linear Regression. Chapter 5: Ordinary Least Squares Regression. Part 2: Multiple Linear Regression Introduction

Principal components

We d like to know the equation of the line shown (the so called best fit or regression line).

MS&E 226: Small Data

Exam 3 Practice Questions Psych , Fall 9

Recitation 9: Probability Matrices and Real Symmetric Matrices. 3 Probability Matrices: Definitions and Examples

Stat 101 L: Laboratory 5

Confidence Intervals. - simply, an interval for which we have a certain confidence.

Hypothesis Testing. We normally talk about two types of hypothesis: the null hypothesis and the research or alternative hypothesis.

Mon 3 Nov Tuesday 4 Nov: Quiz 8 ( ) Friday 7 Nov: Exam 2!!! Today: 4.5 Wednesday: REVIEW. In class Covers

Math 147 Lecture Notes: Lecture 12

Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you.

Math Circle at FAU 10/27/2018 SOLUTIONS

Confidence intervals

Education Production Functions. April 7, 2009

1 The Classic Bivariate Least Squares Model

What is proof? Lesson 1

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.

AP Calculus Summer Homework Worksheet Instructions

Stat 5102 Final Exam May 14, 2015

Algebra & Trig Review

Experiment 1: The Same or Not The Same?

One-to-one functions and onto functions

Math 231E, Lecture 13. Area & Riemann Sums

AP Statistics L I N E A R R E G R E S S I O N C H A P 7

5.2 Infinite Series Brian E. Veitch

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Grades 7 & 8, Math Circles 10/11/12 October, Series & Polygonal Numbers

An introduction to plotting data

Lecture 10: F -Tests, ANOVA and R 2

Math 31 Lesson Plan. Day 16: Review; Start Section 8. Elizabeth Gillaspy. October 18, Supplies needed: homework. Colored chalk. Quizzes!

Mathematica Project 3

Math 31 Lesson Plan. Day 2: Sets; Binary Operations. Elizabeth Gillaspy. September 23, 2011

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

Free Ebooks Laboratory Manual In Physical Geology

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

L21: Chapter 12: Linear regression

CS1800: Sequences & Sums. Professor Kevin Gold

Chapter 26: Comparing Counts (Chi Square)

Data Analysis Using R ASC & OIR

Simple, Marginal, and Interaction Effects in General Linear Models

Chaos, Complexity, and Inference (36-462)

COMPSCI 611 Advanced Algorithms Second Midterm Exam Fall 2017

The Haar Wavelet Transform: Compression and. Reconstruction

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression

These Choice Boards cover the fourteen units for eighth grade Common Core math.

- a value calculated or derived from the data.

Big Bang, Black Holes, No Math

STAT 3022 Spring 2007

Today - SPSS and standard error - End of Midterm 1 exam material - T-scores

Swarthmore Honors Exam 2015: Statistics

Partial Fractions. (Do you see how to work it out? Substitute u = ax + b, so du = a dx.) For example, 1 dx = ln x 7 + C, x x (x 3)(x + 1) = a

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

AMS 7 Correlation and Regression Lecture 8

Linear Probability Model

Statistical Distribution Assumptions of General Linear Models

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Bayesian Linear Regression [DRAFT - In Progress]

Solving Quadratic & Higher Degree Equations

HOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable

Multiple Linear Regression

SCHOOL OF MATHEMATICS AND STATISTICS

MITOCW ocw f99-lec17_300k

L6: Regression II. JJ Chen. July 2, 2015

Regression on Faithful with Section 9.3 content

ASTRO 114 Lecture Okay. What we re going to discuss today are what we call radiation laws. We ve

Transcription:

Class: Taylor January 12, 2011 (pdf version) Story time: Dan Willingham, the Cog Psyc Willingham: Professor of cognitive psychology at Harvard Why students don t like school We know lots about psychology now amazingly little about education: Perry preschool project is still a state of the art education experiment There are few other good controlled experiments 1

But we can use what we know about psychology to inform education That is the aproach of this book For example: He pushes stories People relate to humans with special hardware in their brain (cards vs liar): * with cards: the rule is, red on one side means, even number on the other side. Which do you check: A red card? An even card? An odd card? * with people: Drink means over 21. Which do you check: A drinker? An old person? A young person? so try to tell stories People pay attention at start of class new stuff is always interesting. So no need for it to have a connected theme. Hence when I start classes with a short story, blam Dan, the cognitive psychologist. 2

Admistrivia homework / cases project and a final Email statistics.assignments@gmail.com for questions and turning in assignmens Both Sathya and I get messages sent here. So that way you get ahold of who ever is currently on line sooner. Books Introductory Statistics with R by Peter Dalgaard, 2nd edition, ISBN 978-0-387-79053-4, Springer 2008. Linear Models with R by Julian J. Faraway, ISBN 1-58488- 425-8, Chapman & Hall/CRC Press 2005. Software R Allow other software but recommend R Its free, available on OS X/Linux/windows. It is what production level statisticians use Friday at noon, Sathya will give an intro to using R. (And our computer (Anand) person will be there to help load it if you have problems.) 3

The triangle of statistics Statistics has three major pieces mathematics data analysis (i.e. science) communication To be good, you need all three (or at least two of the three). Doing only one isn t as powerful: only mathematics: Terrance Tao, maybe the smartest guy on the planet. I would have recommended him for the genious award but he already has one. only data analysis: called masters level statistician. Employable at big pharm. But low pay. only communication: Called bloggers. Basically unpaid! My goal is to make sure you can make more money than any of these pure states! So MBA s closer to communication math undergrads closer to math corner 4

stat concentrators, closer to data analysis corner but by the end, I want you all to have moved a bit towards the middle. I ll present more mathematics and data analysis since that is what I know best. Todays Topic: Simple linear regression Review of the standard linear model The standard linear regression model is: Y i = α + βx i + ɛ i ɛ i iid N(0, σ 2 ) You will see this equation written in almost any research paper which uses data. The names are often changed, but it is there somewhere. For example, it is basically equation 2.17 in Berndt of the reading. The entire chapter is designed to motivate that one equation. Let s break it down into pieces. The fit: Y i = α + βx }{{} i +ɛ i ɛ i iid N(0, σ 2 ) the fit the most fun part is the fit. It describes the relationship between x and Y. This version describes a linear relationship. 5

Residuals / errors: Y i = α + βx i + ɛ i ɛ i iid N(0, σ 2 ) }{{} The residuals The residuals (aka errors) themselves. Describing them, looking at them, investigating them is the primary activity of a statistician. It is all about error! The i.d. : The i.i.d. part can be broken into two pieces, i. and i.d. The easier is the identically distributed. It means each error looks like any other error. The i. : The first i in IID is for independence. We will spend an entire class on this piece. It is the most important assumption in the entire model. the N : Means normal. Look at a q-q plot to check it. It is easy to check (hence we cover it in intro classes). We won t discuss it here since I assume you already know how to check it. Style: iid = i.i.d. = IID = I.I.D. = independent and identically distributed. It is often even left off entirely since it is always assumed. Y is upper case, x is lower case: Recall from probability that random variables are often writtten as upper case letters. This is why Y is written as an upper case it is random. The x are thought of as inputs, and hence not random. 6

i is the row index. We might even say how many rows we have by the cryptic addition to the equation: (i = 1,..., n) Y i = α + βx i + ɛ i ɛ i iid N(0, σ 2 ) Is linear good enough? The triangle answers Communication: Littlewood s principle: Almost all functions are almost continuous almost everywhere. And from Stone- Weierstrass, all continuous functions are aproximately equal to a polynomial. And all polynomials look like lines if you investigate them close enough to a zero. Mathematics: Taylor (wiki) tells us that everything can be approximated by a linear equation. So if there is a true relationship between Y and x that is non-linear, then we could say E(Y x) = f(x) (This is yet another cryptic for of our main equation. It could be written as Y = f(x) + ɛ to make it look more like our previous equation.) So Taylor s theorem says that E(Y x) α + βx and even tells us what α and β are. 7

Data analysis: Linear is esiest to look at so start there. Then use residuals to decide if it is good enough. Practice First get the data. grandfather used: For me, I use the command line, just like your wget http://www-stat.wharton.upenn.edu/~waterman/fsw/datasets/txt/clea You of course have this new fangled deviced called a mouse so use it! Now start R. First read in the file: > read.table("cleaning.txt") Oops, that generates too much output, and doesn t put it anywhere. So let s assign all this mess to a data frame. > clean = read.table("cleaning.txt") Just look at what we have by typing clean again. Oops we have the first row with the names of the variables in it. So let s try again: > clean = read.table("cleaning.txt", header = TRUE) 8

Checking with clean shows we only have numbers. How happy can you get?!? Now for the fun part, let s run a regression. > lm(clean$roomsclean ~ clean$numberofcrews) Call: lm(formula = clean$roomsclean ~ clean$numberofcrews) Coefficients: (Intercept) clean$numberofcrews 1.785 3.701 Kinda a different world view than JMP. It just gives the minimal amount of output possible. So to see a bit more, try > summary(lm(clean$roomsclean ~ clean$numberofcrews)) Call: lm(formula = clean$roomsclean ~ clean$numberofcrews) Residuals: Min 1Q Median 3Q Max -15.9990-4.9901 0.8046 4.0010 17.0010 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 1.7847 2.0965 0.851 0.399 9

clean$numberofcrews 3.7009 0.2118 17.472 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 7.336 on 51 degrees of freedom Multiple R-squared: 0.8569, Adjusted R-squared: 0.854 F-statistic: 305.3 on 1 and 51 DF, p-value: < 2.2e-16 That should look very similar to other tables you have seen. But what of pictures? Well, let s do a plot: > plot(lm(clean$roomsclean ~ clean$numberofcrews)) 10

10 20 30 40 50 60 20 10 0 10 20 Fitted values Residuals lm(clean$roomsclean ~ clean$numberofcrews) Residuals vs Fitted 46 31 5 11