Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56

Size: px
Start display at page:

Download "Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56"

Transcription

1 STAT Spring Quarter Midterm 1 - April 27, 2017 Name: Student ID Number: Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56 Directions. Read directions carefully and show all your work. Define your own notations. You do not need to simplify or evaluate unless indicated. Partial credit will be assigned based upon the correctness, completeness and clarity of your answers. Correct answers without proper justification will not receive full credit. The exam is closed book, closed notes. Calculators and other electronic devices are not allowed.

2 Problem 1.[6 points] Suppose that we use the K-nearest neighbors (KNN) method for a classification problem using different values of K. (a) [3 points] Provide a sketch of typical training error rate, test error rate, and Bayes error rate, on a single plot. The x-axis should represent 1/K, and the y-axis should represent the values for each curve. There should be three curves. Make sure to label each one. Answer. Error Rate Training Errors Test Errors /K Figure 1: The black dashed line represents the Bayes error rate. (b) [2 points] As K decreases, does the level of flexibility increase or decrease? Justify your answer. Answer. As K decreases, the method becomes more flexible. For very low values of K, the method may find patterns in the data that do not correspond to the Bayes decision boundary, and thus overfits. For the extreme case K = 1, the training error is 0, but the test error rate may be quite high. (c) [1 point] Draw a vertical line on the previous plot and show the part of the graph where overfitting occurs. Answer. Overfitting occurs when training error is low and test error is large. In the above graph, this corresponds to values of K below 10.

3 Problem 2.[8 points] I collect a set of data (n = 100 observations) containing a single predictor and a quantitative response. I then fit a linear regression model to the data, as well as a separate cubic regression, i.e. Y = β 0 + β 1 X + β 2 X 2 + β 3 X 3 + ε. (a) [2 points] Suppose that the true relationship between X and Y is linear, i.e. Y = β 0 +β 1 X +ε. Consider the training residual sum of squares (RSS) for the linear regression, and also the training RSS for the cubic regression. Would we expect one to be lower than the other, would we expect them to be the same, or is there not enough information to tell? Justify your answer. Answer. The cubic regression model is more flexible than the linear regression model. Therefore, we would expect the cubic model to fit better the data, and thus to have lower training RSS. (b) [2 points] Answer (a) using test rather than training RSS. Answer. If the true relationship between X and Y is linear, a cubic regression model is excessively flexible, and we would expect the method to fit test data poorly. Therefore, we would expect the cubic model to have a higher test RSS. (c) [2 points] Suppose that the true relationship between X and Y is not linear, but we do not know how far it is from linear. Consider the training RSS for the linear regression, and also the training RSS for the cubic regression. Would we expect one to be lower than the other, would we expect them to be the same, or is there not enough information to tell? Justify your answer. Answer. Same answer as (a). (d) [2 points] Answer (c) using test rather than training RSS. Answer. In this case, we do not know the right amount of flexibility to fit the true underlying model. So there is not enough information to tell which model would give the lower test RSS.

4 Problem 3.[14 points] Data for 51 U.S. states (50 states, plus the District of Columbia) was used to examine the relationship between violent crime rate (violent crimes per 100,000 persons per year) and the predictor variables of urbanization (percentage of the population living in urban areas) and poverty rate. A predictor variable indicating whether or not a state is classified as a Southern state (1 = Southern, 0 = not) was also included. Finally, we include two interaction terms {Urban-South} and {Poverty-South}. Some output for the analysis of this data is shown below. ## C o e f f i c i e n t s : ## Estimate Std. Error t v a l u e Pr( > t ) ## ( I n t e r c e p t ) ## Urban ## Poverty ## South ## Urban : South ## Poverty : South ## ## ## Residual standard e r r o r : on 45 d e g r e e s o f freedom ## F s t a t i s t i c : on 5 and 45 DF, p v a l u e : <2e 16 (a) [2 points] Write the multiple linear regression model. Answer. The multiple linear regression model reads as follows: violent crime rate = β 0 +β 1 Urban+β 2 Poverty+β 3 South+β 4 (Urban South)+β 5 (Poverty South)+ε (1) (b) [2 points] Predict the violent crime rate for a Southern state with an urbanization of 55.4 and a poverty rate of Answer. Given the least-squares coefficient estimates, we can make the following prediction for the violent crime rate: (55.4 1) (13.7 1) (c) [2 points] Give an approximate 95% confidence interval for the coefficient related to the poverty rate. Answer. An approximate 95% confidence interval for β 2 is ˆβ 2 ± 2 SE( ˆβ 2 ). Therefore, the interval [ , ] contains β 2 with an approximate probability of (d) [2 points] Is there a relationship between the predictors and the response? Answer. To answer this question, we perform the hypothesis testing: H 0 : all coefficients β j = 0 for j = 1,..., 5 against H a : at least one β j 0 for j = 1,..., 5. Under H 0, the p-value is very small. Therefore, we can reject the null hypothesis, and conclude that there a relationship between the predictors and the response. (e) [2 points] Which predictors appear to have a statistically significant relationship to the response? Answer. To answer this question, we perform five individual hypothesis testings: for j = 1,..., 5, H 0 : β j = 0 for j = 1,..., 5 against H a : β j 0. We can reject the null hypothesis for a given predictor, if the corresponding p-value is small enough. If we choose a p-value cutoff of 5%, then all predictors appear to have a statistically significant relationship to the response, except for the interaction term {Poverty-South}. (f) [2 points] What does the coefficient for the interaction term {Urban-South} suggest?

5 Answer. According to the least-squares fit, we obtain the following prediction for the violent crime rate Urban Poverty South (Urban South) (Poverty South) = ( South) Urban Poverty South (Poverty South) In other words, in Southern states, a 1% increase in the population living in urban areas will increase the violent crime rate by on average. Hence the effect of urbanization is amplified in Southern states. (g) [2 points] To what extent do you think the R 2 -statistic and the residual standard error would change if we remove the interaction term {Poverty-South} from the model? Answer. According to (e), the interaction term {Poverty-South} is not significantly related to the response. Therefore, one would expect a tiny decrease of the R 2 -statistic, and either a slight increase or decrease of the residual standard error.

6 Problem 4.[10 points] A scientific foundation wanted to evaluate the relation between Y = salary of researcher (in thousands of dollars), and four predictors, X 1 = number of years of experience, X 2 = an index of publication quality, X 3 = sex (1 for Male and 0 for Female), and X 4 = an index of success in obtaining grant support. A sample of 35 randomly selected researchers was used to fit the multiple linear regression model. Parts of the computer output appear below. ## C o e f f i c i e n t s : ## Estimate Std. Error t v a l u e Pr( > t ) ## ( I n t e r c e p t ) ## Years ## Papers ? ## Sex ## Grants ## ## ## Residual standard e r r o r : 1.75 on 30 d e g r e e s o f freedom ## M u l t i p l e R squared : ## F s t a t i s t i c :? on 30 and 4 DF, p v a l u e :? (a) [2 points] Explain how the t-statistic for the number of years of experience was computed. Answer. The multiple linear regression model of interest is Y = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + ε. (2) The t-statistic for the number of years of experience is obtained as follows: ˆβ 1 /SE( ˆβ 1 ) = / (b) [2 points] The 97.5% quantile of a t-distribution with 30 degrees of freedom is Do you expect the p-value associated with the index of publication quality to be greater than or less than 0.05? Answer. The p-value of interest is computed for the hypothesis test: H 0 : β 2 = 0 against H a : β 2 0. We know that under H 0, the t-statistic follows a t-distribution with 30 degrees of freedom. Since the computed t-value is larger than 2.042, we reject the null hypothesis at the level of significance 5%. Therefore, we expect the p-value to be less than (c) [2 points] How well does the regression model explain the variability in the salary of a researcher? Answer. This is answered by reading the R 2 -statistic. Here, 92.3% of the variability in the salary of a researcher is explained by performing the multiple linear regression (2), which corresponds to a very good fit. (d) [2 points] Recall that the formula for the F -statistic is F = TSS RSS RSS n p 1 p where n is the number of observations and p is the number of predictors. And the R 2 statistic is given by R 2 = 1 RSS TSS. Using basic algebra, prove the following relationship between the F -statistic and the R 2 statistic: F = R2 1 R 2 n p 1. p,

7 Answer. It is sufficient to prove that (TSS RSS)/RSS = R 2 /(1 R 2 ). We start from the left-hand side. TSS RSS RSS = TSS RSS 1 = 1 1 R 2 1 = 1 (1 R2 ) 1 R 2 = R2 1 R 2. (e) [2 points] From (d), do you expect the value of the F -statistic for our data to be very large or very small? Answer. If R 2 is large, then 1 R 2 is small, so R 2 /(1 R 2 ) is large, and the F -statistic as well, using (d). For our data, R 2 = Therefore we expect the value of the F -statistic to be very large.

8 Problem 5.[8 points] A data set consists of percentage returns for the S&P 500 stock index over 1,089 weekly returns for 21 years, from the beginning of 1990 to the end of For each date, we have recorded the year that the observation was recorded, the percentage returns for each of the five previous trading weeks, Lag1 through Lag5. We have also recorded Volume (the number of shares traded on the previous week, in billions) and Direction (whether the market was Up or Down on a given week). In this problem, a prediction is based on whether the predicted probability of a market increase is greater than or less than 0.5. (a) [2 points] We perform a logistic regression with Direction as the response and the five lag variables plus Volume as predictors. We obtain the following output using a statistical software. ## C o e f f i c i e n t s : ## Estimate Std. Error z v a l u e Pr( > z ) ## ( I n t e r c e p t ) ## Lag ## Lag ## Lag ## Lag ## Lag ## Volume Estimate the probability that the market goes up next week with Lag1 = 1.26%, Lag2 = -1.96%, Lag3 = 0.97%, Lag4 = 0.72%, Lag5 = 0.09% and a volume of 1.46 billion shares traded on the previous week. Answer. The logistic regression model reads as follows p(x) = eβ0+β1 Lag1+β2 Lag2+β3 Lag3+β4 Lag4+β5 Lag5+β6 Volume 1 + e β0+β1 Lag1+β2 Lag2+β3 Lag3+β4 Lag4+β5 Lag5+β6 Volume (3) where p(x) is the probability that the market goes up given the five lag variables and the volume of traded shares. Using the coefficient estimates given above, we obtain the following predicted probability p(x) = e ( 1.96) e ( 1.96) (b) [2 points] Do any of the predictors appear to be statistically significant? If so, which ones? Answer. On the basis of the p-values, and for a cutoff of 5%, Lag2 appears to be the only predictor to be statistically significant. (c) [2 points] Now we fit four classifiers using a training data period from 1990 to 2008, with Lag2 as the only predictor: logistic regression model, linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and K-nearest neighbors (KNN) with K = 1. We obtain the following confusion matrices for the held-out data (that is, the data from 2009 and 2010) for the four classifiers. True Direction Down Up Predicted Down 9 5 Direction Up Table 1: Confusion matrix using logistic regression. True Direction Down Up Predicted Down 9 5 Direction Up Table 2: Confusion matrix using LDA. Compute the overall fraction of correct predictions for the held-out data for the four classifiers. Answer. The overall fraction of correct predictions for the held-out data for the four classifiers is:

9 True Direction Down Up Predicted Down 0 0 Direction Up Table 3: Confusion matrix using QDA. True Direction Down Up Predicted Down Direction Up Table 4: Confusion matrix using KNN with K = 1. (9+56)/( )= 65/104 for logistic regression, (9+56)/( )= 65/104 for LDA, (0+61)/( )= 61/104 for QDA, (21+31)/( )= 1/2 for KNN with K = 1. (d) [2 points] Which of these methods appears to provide the best results on this data? Answer. On the basis of the test error rates, logistic regression and LDA seem to be tied for the best method. However, one could challenge that statement by considering other values of K for the KNN method.

10 Problem 6.[10 points] Suppose that n observations x 1, x 2,..., x n are drawn from a Poisson distribution with unknown parameter λ. Recall that the probability mass function of a Poisson distribution with parameter λ is p(x; λ) = e λ λ x /x!, where x is a nonnegative integer. (a) [2 points] Compute L(λ) the likelihood function of x 1,..., x n. Then give ln L(λ) the log-likelihood of x 1,..., x n. Answer. The likelihood function of x 1,..., x n is L(λ) = n p(x i ; λ) = n λxi λ e Therefore, the log-likelihood of x 1,..., x n is ( n ) ln L(λ) = nλ + x i ln λ n xi x i! = λ e nλ n x i! n ln(x i!) (b) [2 points] Determine ˆλ the maximum likelihood estimator of λ. You should find ˆλ = 1 n not need to check the sign of the second derivative. Answer. The derivative of ln L(λ) is n d ln L(λ) = n + x i. dλ λ Setting the derivative to zero gives the estimate ˆλ = 1 n n x i. n X i. You do (c) [1 point] Application: Researchers want to investigate whether reading may prevent Alzheimer s disease. To do so, they examined 2,000 individuals with Alzheimer s and 8,000 without Alzheimer s. Give π 1 the prior probability that a person has Alzheimer s. Answer. The prior probability that a person has Alzheimer s is π 1 = 2, 000/(2, , 000) = 0.2. (d) [2 points] Researchers discovered that people without Alzheimer s read 0.85 book per month on average, while people with Alzheimer s read 0.33 book per month on average. Assuming that the number of books that an individual reads per year follows a Poisson distribution, predict the probability that a person has Alzheimer s if he or she reads 7 books per year. Hint: Use Bayes theorem. Answer. Let A be the event that a person has Alzheimer s, A c be the event that a person does not have Alzheimer s, and B be the number of books that this person reads per year. Since Using the Bayes theorem, we obtain P(A)P(B = 7 A) P(A B = 7) = P(A)P(B = 7 A) + P(A c )P(B = 7 A c ) π 1 p(7; ) = π 1 p(7; ) + (1 π 1 ) p(7; ) = 0.2 e ( ) 7 /7! 0.2 e ( ) 7 /7! e ( ) 7 /7!

11 (e) [2 points] We consider the general case of classification with only one predictor X. Suppose that we have K classes, and that if an observation belongs to the k-th class then X comes from a Poisson distribution with parameter λ k. Prove that in this case, the Bayes classifier assigns an observation X = x to the class for which δ k (x) = ln π k λ k + x ln λ k is largest. Answer. Recall that the Bayes classifier assigns an observation X = x to the class for which p k (x) = P(Y = k X = x) is largest. We have that π k p(x; λ k ) p k (x) = K l=1 π l p(x; λ l ) = π k e λ k λ x! K l=1 π l e λ l where π k is the prior probability for an observation X = x to belong to the k-th class. As the denominator is the same across all classes, and since the function ln is non-decreasing, choosing the class that maximizes p k (x) is the same as choosing the class that maximizes ln ( π k e λ k λx k x! x k λx l x! ) = ln π k λ k + x ln λ k ln x! From the above equation, we obtain the result since the term ln x! does not depend on the class. (f) [1 point] From (e), what is the shape of the Bayes decision boundaries? Answer. The Bayes decision boundaries correspond to the values of X = x for which p k (x) = p l (x), or equivalently δ k (x) = δ l (x), for any pair of classes k and l. Since the functions δ k are linear functions of x, so are the Bayes decision boundaries.,

Department of Computer and Information Science and Engineering. CAP4770/CAP5771 Fall Midterm Exam. Instructor: Prof.

Department of Computer and Information Science and Engineering. CAP4770/CAP5771 Fall Midterm Exam. Instructor: Prof. Department of Computer and Information Science and Engineering UNIVERSITY OF FLORIDA CAP4770/CAP5771 Fall 2016 Midterm Exam Instructor: Prof. Daisy Zhe Wang This is a in-class, closed-book exam. This exam

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Stat 135 Fall 2013 FINAL EXAM December 18, 2013

Stat 135 Fall 2013 FINAL EXAM December 18, 2013 Stat 135 Fall 2013 FINAL EXAM December 18, 2013 Name: Person on right SID: Person on left There will be one, double sided, handwritten, 8.5in x 11in page of notes allowed during the exam. The exam is closed

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours Instructions: STATS216v Introduction to Statistical Learning Stanford University, Summer 2017 Remember the university honor code. Midterm Exam (Solutions) Duration: 1 hours Write your name and SUNet ID

More information

CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 2013 CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

STAT FINAL EXAM

STAT FINAL EXAM STAT101 2013 FINAL EXAM This exam is 2 hours long. It is closed book but you can use an A-4 size cheat sheet. There are 10 questions. Questions are not of equal weight. You may need a calculator for some

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

ECON 497 Midterm Spring

ECON 497 Midterm Spring ECON 497 Midterm Spring 2009 1 ECON 497: Economic Research and Forecasting Name: Spring 2009 Bellas Midterm You have three hours and twenty minutes to complete this exam. Answer all questions and explain

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS STAT 512 MidTerm I (2/21/2013) Spring 2013 Name: Key INSTRUCTIONS 1. This exam is open book/open notes. All papers (but no electronic devices except for calculators) are allowed. 2. There are 5 pages in

More information

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Classification Methods II: Linear and Quadratic Discrimminant Analysis Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear

More information

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20. 10-601 Machine Learning, Midterm Exam: Spring 2008 Please put your name on this cover sheet If you need more room to work out your answer to a question, use the back of the page and clearly mark on the

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Simple Linear Regression: One Qualitative IV

Simple Linear Regression: One Qualitative IV Simple Linear Regression: One Qualitative IV 1. Purpose As noted before regression is used both to explain and predict variation in DVs, and adding to the equation categorical variables extends regression

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

You have 3 hours to complete the exam. Some questions are harder than others, so don t spend too long on any one question.

You have 3 hours to complete the exam. Some questions are harder than others, so don t spend too long on any one question. Data 8 Fall 2017 Foundations of Data Science Final INSTRUCTIONS You have 3 hours to complete the exam. Some questions are harder than others, so don t spend too long on any one question. The exam is closed

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti

Lecture 2: Categorical Variable. A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti Lecture 2: Categorical Variable A nice book about categorical variable is An Introduction to Categorical Data Analysis authored by Alan Agresti 1 Categorical Variable Categorical variable is qualitative

More information

28. SIMPLE LINEAR REGRESSION III

28. SIMPLE LINEAR REGRESSION III 28. SIMPLE LINEAR REGRESSION III Fitted Values and Residuals To each observed x i, there corresponds a y-value on the fitted line, y = βˆ + βˆ x. The are called fitted values. ŷ i They are the values of

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Machine Learning, Midterm Exam: Spring 2009 SOLUTION 10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of

More information

4. Nonlinear regression functions

4. Nonlinear regression functions 4. Nonlinear regression functions Up to now: Population regression function was assumed to be linear The slope(s) of the population regression function is (are) constant The effect on Y of a unit-change

More information

ISQS 5349 Final Exam, Spring 2017.

ISQS 5349 Final Exam, Spring 2017. ISQS 5349 Final Exam, Spring 7. Instructions: Put all answers on paper other than this exam. If you do not have paper, some will be provided to you. The exam is OPEN BOOKS, OPEN NOTES, but NO ELECTRONIC

More information

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far

More information

Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you.

Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you. ISQS 5347 Final Exam Spring 2017 Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you. 1. Recall the commute

More information

Classification: Linear Discriminant Analysis

Classification: Linear Discriminant Analysis Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based

More information

Bayesian Classification Methods

Bayesian Classification Methods Bayesian Classification Methods Suchit Mehrotra North Carolina State University smehrot@ncsu.edu October 24, 2014 Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 1 / 33 How do you define

More information

Midterm Exam, Spring 2005

Midterm Exam, Spring 2005 10-701 Midterm Exam, Spring 2005 1. Write your name and your email address below. Name: Email address: 2. There should be 15 numbered pages in this exam (including this cover sheet). 3. Write your name

More information

Statistics 135 Fall 2008 Final Exam

Statistics 135 Fall 2008 Final Exam Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

AMS 7 Correlation and Regression Lecture 8

AMS 7 Correlation and Regression Lecture 8 AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

y ˆ i = ˆ  T u i ( i th fitted value or i th fit) 1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information

DISCRIMINANT ANALYSIS: LDA AND QDA

DISCRIMINANT ANALYSIS: LDA AND QDA Stat 427/627 Statistical Machine Learning (Baron) HOMEWORK 6, Solutions DISCRIMINANT ANALYSIS: LDA AND QDA. Chap 4, exercise 5. (a) On a training set, LDA and QDA are both expected to perform well. LDA

More information

Multiple Linear Regression

Multiple Linear Regression 1. Purpose To Model Dependent Variables Multiple Linear Regression Purpose of multiple and simple regression is the same, to model a DV using one or more predictors (IVs) and perhaps also to obtain a prediction

More information

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Eco517 Fall 2004 C. Sims MIDTERM EXAM Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering

More information

Assignment 2: K-Nearest Neighbors and Logistic Regression

Assignment 2: K-Nearest Neighbors and Logistic Regression Assignment 2: K-Nearest Neighbors and Logistic Regression SDS293 - Machine Learning Due: 4 Oct 2017 by 11:59pm Conceptual Exercises 4.4 parts a-d (p. 168-169 ISLR) When the number of features p is large,

More information

Swarthmore Honors Exam 2012: Statistics

Swarthmore Honors Exam 2012: Statistics Swarthmore Honors Exam 2012: Statistics 1 Swarthmore Honors Exam 2012: Statistics John W. Emerson, Yale University NAME: Instructions: This is a closed-book three-hour exam having six questions. You may

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

Introduction to Data Science

Introduction to Data Science Introduction to Data Science Winter Semester 2018/19 Oliver Ernst TU Chemnitz, Fakultät für Mathematik, Professur Numerische Mathematik Lecture Slides Contents I 1 What is Data Science? 2 Learning Theory

More information

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X 1.04) =.8508. For z < 0 subtract the value from

More information

LDA, QDA, Naive Bayes

LDA, QDA, Naive Bayes LDA, QDA, Naive Bayes Generative Classification Models Marek Petrik 2/16/2017 Last Class Logistic Regression Maximum Likelihood Principle Logistic Regression Predict probability of a class: p(x) Example:

More information

2011 Pearson Education, Inc

2011 Pearson Education, Inc Statistics for Business and Economics Chapter 7 Inferences Based on Two Samples: Confidence Intervals & Tests of Hypotheses Content 1. Identifying the Target Parameter 2. Comparing Two Population Means:

More information

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH The First Step: SAMPLE SIZE DETERMINATION THE ULTIMATE GOAL The most important, ultimate step of any of clinical research is to do draw inferences;

More information

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information

Department of Computer Science and Engineering CSE 151 University of California, San Diego Fall Midterm Examination

Department of Computer Science and Engineering CSE 151 University of California, San Diego Fall Midterm Examination Department of Computer Science and Engineering CSE 151 University of California, San Diego Fall 2008 Your name: Midterm Examination Tuesday October 28, 9:30am to 10:50am Instructions: Look through the

More information

Stat 500 Midterm 2 8 November 2007 page 0 of 4

Stat 500 Midterm 2 8 November 2007 page 0 of 4 Stat 500 Midterm 2 8 November 2007 page 0 of 4 Please put your name on the back of your answer book. Do NOT put it on the front. Thanks. DO NOT START until I tell you to. You are welcome to read this front

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow) STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples Objective Section 9.4 Inferences About Two Means (Matched Pairs) Compare of two matched-paired means using two samples from each population. Hypothesis Tests and Confidence Intervals of two dependent means

More information

Machine Learning for Signal Processing Bayes Classification

Machine Learning for Signal Processing Bayes Classification Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0

2) For a normal distribution, the skewness and kurtosis measures are as follows: A) 1.96 and 4 B) 1 and 2 C) 0 and 3 D) 0 and 0 Introduction to Econometrics Midterm April 26, 2011 Name Student ID MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. (5,000 credit for each correct

More information

Inference with Simple Regression

Inference with Simple Regression 1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems

More information

Lecture #11: Classification & Logistic Regression

Lecture #11: Classification & Logistic Regression Lecture #11: Classification & Logistic Regression CS 109A, STAT 121A, AC 209A: Data Science Weiwei Pan, Pavlos Protopapas, Kevin Rader Fall 2016 Harvard University 1 Announcements Midterm: will be graded

More information

Confidence Intervals for Comparing Means

Confidence Intervals for Comparing Means Comparison 2 Solutions COR1-GB.1305 Statistics and Data Analysis Confidence Intervals for Comparing Means 1. Recall the class survey. Seventeen female and thirty male students filled out the survey, reporting

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Solutions to First Midterm Exam, Stat 371, Spring those values: = Or, we can use Rule 6: = 0.63.

Solutions to First Midterm Exam, Stat 371, Spring those values: = Or, we can use Rule 6: = 0.63. Solutions to First Midterm Exam, Stat 371, Spring 2010 There are two, three or four versions of each question. The questions on your exam comprise a mix of versions. As a result, when you examine the solutions

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

STA 6167 Exam 3 Spring 2016 PRINT Name

STA 6167 Exam 3 Spring 2016 PRINT Name STA 6167 Exam 3 Spring 2016 PRINT Name Conduct all tests at = 0.05 significance level. Q.1. A study in Edmonton, Canada modelled the relationship between the number of fresh food stores (including: supermarket,

More information

ST 305: Final Exam ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) ( ) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ Y. σ X. σ n.

ST 305: Final Exam ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) ( ) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ Y. σ X. σ n. ST 305: Final Exam By handing in this completed exam, I state that I have neither given nor received assistance from another person during the exam period. I have not copied from another person s paper.

More information

Masters Comprehensive Examination Department of Statistics, University of Florida

Masters Comprehensive Examination Department of Statistics, University of Florida Masters Comprehensive Examination Department of Statistics, University of Florida May 6, 003, 8:00 am - :00 noon Instructions: You have four hours to answer questions in this examination You must show

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Sociology 593 Exam 2 Answer Key March 28, 2002

Sociology 593 Exam 2 Answer Key March 28, 2002 Sociology 59 Exam Answer Key March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018 Work all problems. 60 points are needed to pass at the Masters Level and 75

More information

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

ISQS 5349 Spring 2013 Final Exam

ISQS 5349 Spring 2013 Final Exam ISQS 5349 Spring 2013 Final Exam Name: General Instructions: Closed books, notes, no electronic devices. Points (out of 200) are in parentheses. Put written answers on separate paper; multiple choices

More information

Linear Regression With Special Variables

Linear Regression With Special Variables Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information