Sample sta*s*cs and linear regression. NEU 466M Instructor: Professor Ila R. Fiete Spring 2016

Size: px
Start display at page:

Download "Sample sta*s*cs and linear regression. NEU 466M Instructor: Professor Ila R. Fiete Spring 2016"

Transcription

1 Sample sta*s*cs and linear regression NEU 466M Instructor: Professor Ila R. Fiete Spring 2016

2 Mean {x 1,,x N } N samples of variable x hxi 1 N NX i=1 x i sample mean mean(x) other notation: x

3 Binned version of mean {x 1,,x N } N samples of variable x {c 1, c B },B bins {n 1, n B } counts per bin hxi 1 N BX i=1 n i c i sample mean

4 Variance {x 1,,x N } h(x hxi) 2 i 1 N 1 NX (x i hxi) 2 sample variance i=1 a measure of the scajer /spread of the data around its mean value homework: show that h(x hxi) 2 i = hx 2 i hxi 2

5 Standard devia*on {x 1,,x N } p h(x hxi) 2 standard deviation

6 Covariance {x 1,,x N }{y 1,,y N } N samples each of variables x, y C(x, y) 1 N 1 NX (x i hxi)(y i hyi) i=1 sample covariance (C(x, x) is simply sample variance of x)

7 Covariance: what does it measure? C(x, y) 1 N 1 NX (x i hxi)(y i hyi) i=1 If x, y both deviate from their means together (both up then both down) then terms in sum are posi*ve, C(x,y) > 0. If x,y deviate from their means independent of each other, then terms in the sum are randomly posi*ve and nega*ve, C(x,y) ~=0. If x,y deviate from their means in opposite direc*ons, then terms in sum are nega*ve, C(x,y) < 0. Literally, covariance is a measure of co- varia*on.

8 4 3 Covariance example I x, y independent x = randn(1000, 1) y = randn(1000, 1) 2 1 C(x, y) =0.009; C(x, x) =1.069 y x x>0,y around 0 without bias

9 4 3 Covariance example II x, y independent x = 0.2 randn(1000, 1) y = 0.2 randn(1000, 1) 2 1 y x C(x, y) =0.001; C(x, x) =0.0407

10 Covariance example III 2.5 x, y not independent x = randn(1000, 1) y = 0.5 x randn(1000, 1) y x>0,y > x C(x, x) =0.907; C(x, y) =0.464; C(y, y) =0.469

11 Alterna*ve nota*on Mean: hxi, x, µ x, E(x) Variance: hx 2 i hxi 2, x 2 x 2, 2 x, var(x), C(x, x) Covariance: hxyi hxihyi, xy xȳ, Standard devia*on p hx2 i hxi 2, 2 xy, cov(x), C(x, y) q x 2 x 2, x, std(x)

12 Pearson s correla*on coefficient (x, y) = (x hxi)(y hyi) p h(x hxi)2 ih(x hxi) 2 i (x, y) = C(x, y) x y shorter- form nota*on

13 Pearson s correla*on coefficient and covariance only measure linear dependency from: hjps://en.wikipedia.org/wiki/correla*on_and_dependence

14 Robust sta*s*cs? Mean, variance are easy to compute, widely used/useful. But not robust: sensi*ve to outliners. More robust alterna*ve to mean: median.

15 Applica*on LINEAR REGRESSION IN TERMS OF SAMPLE STATISTICS

16 Regression: curve- fi`ng Scalar explanatory variable (X) and response variable (Y); N samples {(x 1,y 1 ), (x 2,y 2 ),, (x N,y N )} ỹ(x) =w 0 + w 1 x + + w M x M = MX j=0 w j x j free parameters: (w 0,w 1,,w M )

17 Linear least- squares regression E = 1 2 = 1 2 = 1 2 NX [ỹ(x n ; w) y n ] 2 n=1 NX [ n=1 MX j=0 w j x j n y n ] 2 NX [w 0 + w 1 x n y n ] 2 n=1 To solve for best w 0, w 1 : M=1 for linear regression de dw 0 =0, de dw 1 =0

18 Linear least- squares regression E = 1 2 NX [w 0 + w 1 x n y n ] 2 n=1 de N dw 0 = X [w 0 + w 1 x n y n ] n=1 = Nw 0 + Nw 1 hxi Nhyi =0 w 0 + w 1 hxi hyi =0 (1)

19 Linear least- squares regression E = 1 2 NX [w 0 + w 1 x n y n ] 2 n=1 de N dw 1 = X [w 0 + w 1 x n y n ]x n n=1 = Nw 0 hxi + Nw 1 hx 2 i Nhxyi =0 w 0 hxi + w 1 hx 2 i hxyi =0 (2)

20 Linear least- squares regression w 1 C(x, y) = C(x, x) w 0 = hyi w 1 hxi slope y intercept In homework: check matlab s polyfit with this op*mal expression for linear- least squares fi`ng.

21 Linear least- squares regression w 1 C(x, y) = C(x, x) w 0 = hyi w 1 hxi slope y intercept Contrast with w 1 : Pearson s correla*on (x, y) = C(x, y) x y Different normaliza*ons: Different correla*on coefficient for same slope but different amounts of x,y- scajer. Same correla*on for different slopes and different x,y scajer. Correla*on: more strongly penalizes y- scajer, more weakly penalizes x- scajer.

22 Slope versus Pearson s correla*on coefficient same slope different ρ from: hjps://en.wikipedia.org/wiki/correla*on_and_dependence different slope, same ρ

23 Applica*on BACK TO SAMPLE STATISTICS: MULTIVARIATE

24 Mul*ple variables: covariance matrix {x 1,,x N } N samples of the αth variable x α K different variables x α, labeled by α, β = {1,,K}: C 1 N 1 K K dim since K variables NX (x i hx i)(x i hx i) i=1 = cov(x,x ) sample covariance matrix

25 Covariance matrix (α,β) element is covariance between x α, x β. Diagonal of covariance matrix is variance of each variable: var(x α ) or C(x α, x α ). K 2 entries total, but only half of off- diagonal terms are independent because of symmetry (C(x β, x α )= C(x α, x β )). Thus only (K 2 - K)/2 + K = K(K+1)/2 independent terms. Q s: How do do linear regression in mul*variate case? Will it involve covariance matrix?

26 4 3 Covariance example I x, y independent x = randn(1000, 1) y = randn(1000, 1) 2 1 y C = apple x

27 Covariance example III 2.5 x, y not independent x = randn(1000, 1) y = 0.5 x randn(1000, 1) y C = apple x

28 Summary Defined sample mean and variance of a variable Defined covariance between a pair of variables Solved op*mal (least- squares) linear regression between two variables in terms of mean, covariance Covariance matrix: covariance between all K(K+1)/2 unique pairs of K variables

Linear Regression and Correla/on. Correla/on and Regression Analysis. Three Ques/ons 9/14/14. Chapter 13. Dr. Richard Jerz

Linear Regression and Correla/on. Correla/on and Regression Analysis. Three Ques/ons 9/14/14. Chapter 13. Dr. Richard Jerz Linear Regression and Correla/on Chapter 13 Dr. Richard Jerz 1 Correla/on and Regression Analysis Correla/on Analysis is the study of the rela/onship between variables. It is also defined as group of techniques

More information

Linear Regression and Correla/on

Linear Regression and Correla/on Linear Regression and Correla/on Chapter 13 Dr. Richard Jerz 1 Correla/on and Regression Analysis Correla/on Analysis is the study of the rela/onship between variables. It is also defined as group of techniques

More information

Linear algebra. NEU 466M Instructor: Professor Ila R. Fiete Spring 2016

Linear algebra. NEU 466M Instructor: Professor Ila R. Fiete Spring 2016 Linear algebra NEU M Instructor: Professor Ila R. Fiete Spring 01 NotaBon Matrices: upper-case A, B, U, W Vector: bold, (usually) lower-case x, y, v, w x! x (handwribng: ) Elements of matrix, vector: lower-case

More information

Time- varying signals: cross- and auto- correla5on, correlograms. NEU 466M Instructor: Professor Ila R. Fiete Spring 2016

Time- varying signals: cross- and auto- correla5on, correlograms. NEU 466M Instructor: Professor Ila R. Fiete Spring 2016 Time- varying signals: cross- and auto- correla5on, correlograms NEU 466M Instructor: Professor Ila R. Fiete Spring 2016 Sta5s5cal measures We first considered simple sta5s5cal measures for single variables

More information

Lecture 17: Estimating and Manipulating Covariance

Lecture 17: Estimating and Manipulating Covariance CS 4980/6980: Data Science c Spring 2018 Lecture 17: Estimating and Manipulating Covariance Instructor: Daniel L. Pimentel-Alarcón Scribed by: Paul Trimor and Jeevitha Meyyappan This is preliminary work

More information

Regression and Covariance

Regression and Covariance Regression and Covariance James K. Peterson Department of Biological ciences and Department of Mathematical ciences Clemson University April 16, 2014 Outline A Review of Regression Regression and Covariance

More information

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx INDEPENDENCE, COVARIANCE AND CORRELATION Independence: Intuitive idea of "Y is independent of X": The distribution of Y doesn't depend on the value of X. In terms of the conditional pdf's: "f(y x doesn't

More information

1. Introduc9on 2. Bivariate Data 3. Linear Analysis of Data

1. Introduc9on 2. Bivariate Data 3. Linear Analysis of Data Lecture 3: Bivariate Data & Linear Regression 1. Introduc9on 2. Bivariate Data 3. Linear Analysis of Data a) Freehand Linear Fit b) Least Squares Fit c) Interpola9on/Extrapola9on 4. Correla9on 1. Introduc9on

More information

Structural Equa+on Models: The General Case. STA431: Spring 2013

Structural Equa+on Models: The General Case. STA431: Spring 2013 Structural Equa+on Models: The General Case STA431: Spring 2013 An Extension of Mul+ple Regression More than one regression- like equa+on Includes latent variables Variables can be explanatory in one equa+on

More information

Covariance and Correlation

Covariance and Correlation Covariance and Correlation ST 370 The probability distribution of a random variable gives complete information about its behavior, but its mean and variance are useful summaries. Similarly, the joint probability

More information

3. General Random Variables Part IV: Mul8ple Random Variables. ECE 302 Fall 2009 TR 3 4:15pm Purdue University, School of ECE Prof.

3. General Random Variables Part IV: Mul8ple Random Variables. ECE 302 Fall 2009 TR 3 4:15pm Purdue University, School of ECE Prof. 3. General Random Variables Part IV: Mul8ple Random Variables ECE 302 Fall 2009 TR 3 4:15pm Purdue University, School of ECE Prof. Ilya Pollak Joint PDF of two con8nuous r.v. s PDF of continuous r.v.'s

More information

Bivariate Distributions. Discrete Bivariate Distribution Example

Bivariate Distributions. Discrete Bivariate Distribution Example Spring 7 Geog C: Phaedon C. Kyriakidis Bivariate Distributions Definition: class of multivariate probability distributions describing joint variation of outcomes of two random variables (discrete or continuous),

More information

APPENDIX 1 BASIC STATISTICS. Summarizing Data

APPENDIX 1 BASIC STATISTICS. Summarizing Data 1 APPENDIX 1 Figure A1.1: Normal Distribution BASIC STATISTICS The problem that we face in financial analysis today is not having too little information but too much. Making sense of large and often contradictory

More information

Class Notes. Examining Repeated Measures Data on Individuals

Class Notes. Examining Repeated Measures Data on Individuals Ronald Heck Week 12: Class Notes 1 Class Notes Examining Repeated Measures Data on Individuals Generalized linear mixed models (GLMM) also provide a means of incorporang longitudinal designs with categorical

More information

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v } Statistics 35 Probability I Fall 6 (63 Final Exam Solutions Instructor: Michael Kozdron (a Solving for X and Y gives X UV and Y V UV, so that the Jacobian of this transformation is x x u v J y y v u v

More information

An Introduction to Parameter Estimation

An Introduction to Parameter Estimation Introduction Introduction to Econometrics An Introduction to Parameter Estimation This document combines several important econometric foundations and corresponds to other documents such as the Introduction

More information

Bias/variance tradeoff, Model assessment and selec+on

Bias/variance tradeoff, Model assessment and selec+on Applied induc+ve learning Bias/variance tradeoff, Model assessment and selec+on Pierre Geurts Department of Electrical Engineering and Computer Science University of Liège October 29, 2012 1 Supervised

More information

Correla'on. Keegan Korthauer Department of Sta's'cs UW Madison

Correla'on. Keegan Korthauer Department of Sta's'cs UW Madison Correla'on Keegan Korthauer Department of Sta's'cs UW Madison 1 Rela'onship Between Two Con'nuous Variables When we have measured two con$nuous random variables for each item in a sample, we can study

More information

Unsupervised Learning: Dimensionality Reduction

Unsupervised Learning: Dimensionality Reduction Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,

More information

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012 Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.

More information

Introduction to Statistical Genetics (BST227) Lecture 6: Population Substructure in Association Studies

Introduction to Statistical Genetics (BST227) Lecture 6: Population Substructure in Association Studies Introduction to Statistical Genetics (BST227) Lecture 6: Population Substructure in Association Studies Confounding in gene+c associa+on studies q What is it? q What is the effect? q How to detect it?

More information

arxiv: v3 [stat.ml] 14 Apr 2016

arxiv: v3 [stat.ml] 14 Apr 2016 arxiv:1307.0048v3 [stat.ml] 14 Apr 2016 Simple one-pass algorithm for penalized linear regression with cross-validation on MapReduce Kun Yang April 15, 2016 Abstract In this paper, we propose a one-pass

More information

Least Squares Parameter Es.ma.on

Least Squares Parameter Es.ma.on Least Squares Parameter Es.ma.on Alun L. Lloyd Department of Mathema.cs Biomathema.cs Graduate Program North Carolina State University Aims of this Lecture 1. Model fifng using least squares 2. Quan.fica.on

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

CSC 411: Lecture 09: Naive Bayes

CSC 411: Lecture 09: Naive Bayes CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1

More information

REGRESSION AND CORRELATION ANALYSIS

REGRESSION AND CORRELATION ANALYSIS Problem 1 Problem 2 A group of 625 students has a mean age of 15.8 years with a standard devia>on of 0.6 years. The ages are normally distributed. How many students are younger than 16.2 years? REGRESSION

More information

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Correlation. A statistics method to measure the relationship between two variables. Three characteristics Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction

More information

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables. Random vectors Recall that a random vector X = X X 2 is made up of, say, k random variables X k A random vector has a joint distribution, eg a density f(x), that gives probabilities P(X A) = f(x)dx Just

More information

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013 Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis

More information

Computational Physics

Computational Physics Interpolation, Extrapolation & Polynomial Approximation Lectures based on course notes by Pablo Laguna and Kostas Kokkotas revamped by Deirdre Shoemaker Spring 2014 Introduction In many cases, a function

More information

ECON 3150/4150, Spring term Lecture 7

ECON 3150/4150, Spring term Lecture 7 ECON 3150/4150, Spring term 2014. Lecture 7 The multivariate regression model (I) Ragnar Nymoen University of Oslo 4 February 2014 1 / 23 References to Lecture 7 and 8 SW Ch. 6 BN Kap 7.1-7.8 2 / 23 Omitted

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

ECON Fundamentals of Probability

ECON Fundamentals of Probability ECON 351 - Fundamentals of Probability Maggie Jones 1 / 32 Random Variables A random variable is one that takes on numerical values, i.e. numerical summary of a random outcome e.g., prices, total GDP,

More information

Data files for today. CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav

Data files for today. CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav Correlation Data files for today CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav Defining Correlation Co-variation or co-relation between two variables These variables change together

More information

Subtract 6 to both sides Divide by 2 on both sides. Cross Multiply. Answer: x = -9

Subtract 6 to both sides Divide by 2 on both sides. Cross Multiply. Answer: x = -9 Subtract 6 to both sides Divide by 2 on both sides Answer: x = -9 Cross Multiply. = 3 Distribute 2 to parenthesis Combine like terms Subtract 4x to both sides Subtract 10 from both sides x = -20 Subtract

More information

BIOSTATISTICS NURS 3324

BIOSTATISTICS NURS 3324 Simple Linear Regression and Correlation Introduction Previously, our attention has been focused on one variable which we designated by x. Frequently, it is desirable to learn something about the relationship

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

Multivariate probability distributions and linear regression

Multivariate probability distributions and linear regression Multivariate probability distributions and linear regression Patrik Hoyer 1 Contents: Random variable, probability distribution Joint distribution Marginal distribution Conditional distribution Independence,

More information

Marginal density. If the unknown is of the form x = (x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density

Marginal density. If the unknown is of the form x = (x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density Marginal density If the unknown is of the form x = x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density πx 1 y) = πx 1, x 2 y)dx 2 = πx 2 )πx 1 y, x 2 )dx 2 needs to be

More information

Priors in Dependency network learning

Priors in Dependency network learning Priors in Dependency network learning Sushmita Roy sroy@biostat.wisc.edu Computa:onal Network Biology Biosta2s2cs & Medical Informa2cs 826 Computer Sciences 838 hbps://compnetbiocourse.discovery.wisc.edu

More information

HW5 Solutions. (a) (8 pts.) Show that if two random variables X and Y are independent, then E[XY ] = E[X]E[Y ] xy p X,Y (x, y)

HW5 Solutions. (a) (8 pts.) Show that if two random variables X and Y are independent, then E[XY ] = E[X]E[Y ] xy p X,Y (x, y) HW5 Solutions 1. (50 pts.) Random homeworks again (a) (8 pts.) Show that if two random variables X and Y are independent, then E[XY ] = E[X]E[Y ] Answer: Applying the definition of expectation we have

More information

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18 Variance reduction p. 1/18 Variance reduction Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Variance reduction p. 2/18 Example Use simulation to compute I = 1 0 e x dx We

More information

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test

More information

9. Least squares data fitting

9. Least squares data fitting L. Vandenberghe EE133A (Spring 2017) 9. Least squares data fitting model fitting regression linear-in-parameters models time series examples validation least squares classification statistics interpretation

More information

STOR 455 STATISTICAL METHODS I

STOR 455 STATISTICAL METHODS I STOR 455 STATISTICAL METHODS I Jan Hannig Mul9variate Regression Y=X β + ε X is a regression matrix, β is a vector of parameters and ε are independent N(0,σ) Es9mated parameters b=(x X) - 1 X Y Predicted

More information

Basic Theory of Linear Differential Equations

Basic Theory of Linear Differential Equations Basic Theory of Linear Differential Equations Picard-Lindelöf Existence-Uniqueness Vector nth Order Theorem Second Order Linear Theorem Higher Order Linear Theorem Homogeneous Structure Recipe for Constant-Coefficient

More information

THE HILBERT SPACE L2

THE HILBERT SPACE L2 THE HILBERT SPACE L Definition: Let (Ω,A,P be a probability space. The set of all random variables X:Ω satisfying is denoted as L. EX < Remark: EX < implies that E X < (or equivalently that EX, because

More information

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157 Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x

More information

STT 843 Key to Homework 1 Spring 2018

STT 843 Key to Homework 1 Spring 2018 STT 843 Key to Homework Spring 208 Due date: Feb 4, 208 42 (a Because σ = 2, σ 22 = and ρ 2 = 05, we have σ 2 = ρ 2 σ σ22 = 2/2 Then, the mean and covariance of the bivariate normal is µ = ( 0 2 and Σ

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

T- test recap. Week 7. One- sample t- test. One- sample t- test 5/13/12. t = x " µ s x. One- sample t- test Paired t- test Independent samples t- test

T- test recap. Week 7. One- sample t- test. One- sample t- test 5/13/12. t = x  µ s x. One- sample t- test Paired t- test Independent samples t- test T- test recap Week 7 One- sample t- test Paired t- test Independent samples t- test T- test review Addi5onal tests of significance: correla5ons, qualita5ve data In each case, we re looking to see whether

More information

TMA4255 Applied Statistics V2016 (5)

TMA4255 Applied Statistics V2016 (5) TMA4255 Applied Statistics V2016 (5) Part 2: Regression Simple linear regression [11.1-11.4] Sum of squares [11.5] Anna Marie Holand To be lectured: January 26, 2016 wiki.math.ntnu.no/tma4255/2016v/start

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Gaussian random variables inr n

Gaussian random variables inr n Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b

More information

Statistics STAT:5100 (22S:193), Fall Sample Final Exam B

Statistics STAT:5100 (22S:193), Fall Sample Final Exam B Statistics STAT:5 (22S:93), Fall 25 Sample Final Exam B Please write your answers in the exam books provided.. Let X, Y, and Y 2 be independent random variables with X N(µ X, σ 2 X ) and Y i N(µ Y, σ 2

More information

The generative approach to classification. A classification problem. Generative models CSE 250B

The generative approach to classification. A classification problem. Generative models CSE 250B The generative approach to classification The generative approach to classification CSE 250B The learning process: Fit a probability distribution to each class, individually To classify a new point: Which

More information

Moment Generating Function. STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution

Moment Generating Function. STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution Moment Generating Function STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution T. Linder Queen s University Winter 07 Definition Let X (X,...,X n ) T be a random vector and

More information

Recall that if X 1,...,X n are random variables with finite expectations, then. The X i can be continuous or discrete or of any other type.

Recall that if X 1,...,X n are random variables with finite expectations, then. The X i can be continuous or discrete or of any other type. Expectations of Sums of Random Variables STAT/MTHE 353: 4 - More on Expectations and Variances T. Linder Queen s University Winter 017 Recall that if X 1,...,X n are random variables with finite expectations,

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Introduction to Computational Finance and Financial Econometrics Matrix Algebra Review

Introduction to Computational Finance and Financial Econometrics Matrix Algebra Review You can t see this text! Introduction to Computational Finance and Financial Econometrics Matrix Algebra Review Eric Zivot Spring 2015 Eric Zivot (Copyright 2015) Matrix Algebra Review 1 / 54 Outline 1

More information

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16 Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3 Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details Section 10.1, 2, 3 Basic components of regression setup Target of inference: linear dependency

More information

Linear Estimation of Y given X:

Linear Estimation of Y given X: Problem: Given measurement Y, estimate X. Linear Estimation of Y given X: Why? You want to know something that is difficult to measure, such as engine thrust. You estimate this based upon something that

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Lecture 13. Simple Linear Regression

Lecture 13. Simple Linear Regression 1 / 27 Lecture 13 Simple Linear Regression October 28, 2010 2 / 27 Lesson Plan 1. Ordinary Least Squares 2. Interpretation 3 / 27 Motivation Suppose we want to approximate the value of Y with a linear

More information

Scatter Plot Quadrants. Setting. Data pairs of two attributes X & Y, measured at N sampling units:

Scatter Plot Quadrants. Setting. Data pairs of two attributes X & Y, measured at N sampling units: Geog 20C: Phaedon C Kriakidis Setting Data pairs of two attributes X & Y, measured at sampling units: ṇ and ṇ there are pairs of attribute values {( n, n ),,,} Scatter plot: graph of - versus -values in

More information

Class 11 Maths Chapter 15. Statistics

Class 11 Maths Chapter 15. Statistics 1 P a g e Class 11 Maths Chapter 15. Statistics Statistics is the Science of collection, organization, presentation, analysis and interpretation of the numerical data. Useful Terms 1. Limit of the Class

More information

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Assignment

More information

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying

More information

CS 6140: Machine Learning Spring 2016

CS 6140: Machine Learning Spring 2016 CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

THE MINIMIZATION PROCESS IN THE CORRELATION ESTIMATION SYSTEM (CES) COMPARED TO LEAST SQUARES IN LINEAR REGRESSION

THE MINIMIZATION PROCESS IN THE CORRELATION ESTIMATION SYSTEM (CES) COMPARED TO LEAST SQUARES IN LINEAR REGRESSION THE MINIMIZATION PROCESS IN THE CORRELATION ESTIMATION SYSTEM (CES) COMPARED TO LEAST SQUARES IN LINEAR REGRESSION RUDY A. GIDEON Abstract. This presentation contains a new system of estimation, starting

More information

Outline. What is Machine Learning? Why Machine Learning? 9/29/08. Machine Learning Approaches to Biological Research: Bioimage Informa>cs and Beyond

Outline. What is Machine Learning? Why Machine Learning? 9/29/08. Machine Learning Approaches to Biological Research: Bioimage Informa>cs and Beyond Outline Machine Learning Approaches to Biological Research: Bioimage Informa>cs and Beyond Robert F. Murphy External Senior Fellow, Freiburg Ins>tute for Advanced Studies Ray and Stephanie Lane Professor

More information

Simple Linear Regression for the MPG Data

Simple Linear Regression for the MPG Data Simple Linear Regression for the MPG Data 2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG What do we do with the data? y i = MPG of i th car x i = Weight of i th car i =1,...,n n = Sample Size Exploratory

More information

1) The line has a slope of ) The line passes through (2, 11) and. 6) r(x) = x + 4. From memory match each equation with its graph.

1) The line has a slope of ) The line passes through (2, 11) and. 6) r(x) = x + 4. From memory match each equation with its graph. Review Test 2 Math 1314 Name Write an equation of the line satisfying the given conditions. Write the answer in standard form. 1) The line has a slope of - 2 7 and contains the point (3, 1). Use the point-slope

More information

The Multivariate Gaussian Distribution

The Multivariate Gaussian Distribution The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance

More information

BIO 682 Multivariate Statistics (Lite) Spring 2010

BIO 682 Multivariate Statistics (Lite) Spring 2010 BIO 682 Multivariate Statistics (Lite) Spring 2010 Steve Shuster http://www4.nau.edu/shustercourses/bio682/index.htm Lecture 10 Outline for This Section 1. Multiple regression in ecological and behavioral

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Can you tell the relationship between students SAT scores and their college grades?

Can you tell the relationship between students SAT scores and their college grades? Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower

More information

Next tool is Partial ACF; mathematical tools first. The Multivariate Normal Distribution. e z2 /2. f Z (z) = 1 2π. e z2 i /2

Next tool is Partial ACF; mathematical tools first. The Multivariate Normal Distribution. e z2 /2. f Z (z) = 1 2π. e z2 i /2 Next tool is Partial ACF; mathematical tools first. The Multivariate Normal Distribution Defn: Z R 1 N(0,1) iff f Z (z) = 1 2π e z2 /2 Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) (a column

More information

MR. YATES. Vocabulary. Quadratic Cubic Monomial Binomial Trinomial Term Leading Term Leading Coefficient

MR. YATES. Vocabulary. Quadratic Cubic Monomial Binomial Trinomial Term Leading Term Leading Coefficient ALGEBRA II WITH TRIGONOMETRY COURSE OUTLINE SPRING 2009. MR. YATES Vocabulary Unit 1: Polynomials Scientific Notation Exponent Base Polynomial Degree (of a polynomial) Constant Linear Quadratic Cubic Monomial

More information

REVIEW (MULTIVARIATE LINEAR REGRESSION) Explain/Obtain the LS estimator () of the vector of coe cients (b)

REVIEW (MULTIVARIATE LINEAR REGRESSION) Explain/Obtain the LS estimator () of the vector of coe cients (b) REVIEW (MULTIVARIATE LINEAR REGRESSION) Explain/Obtain the LS estimator () of the vector of coe cients (b) Explain/Obtain the variance-covariance matrix of Both in the bivariate case (two regressors) and

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

CS70: Jean Walrand: Lecture 22.

CS70: Jean Walrand: Lecture 22. CS70: Jean Walrand: Lecture 22. Confidence Intervals; Linear Regression 1. Review 2. Confidence Intervals 3. Motivation for LR 4. History of LR 5. Linear Regression 6. Derivation 7. More examples Review:

More information

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2016-17 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This

More information

Chapter 1: Precalculus Review

Chapter 1: Precalculus Review : Precalculus Review Math 115 17 January 2018 Overview 1 Important Notation 2 Exponents 3 Polynomials 4 Rational Functions 5 Cartesian Coordinates 6 Lines Notation Intervals: Interval Notation (a, b) (a,

More information

Correlation and Regression Theory 1) Multivariate Statistics

Correlation and Regression Theory 1) Multivariate Statistics Correlation and Regression Theory 1) Multivariate Statistics What is a multivariate data set? How to statistically analyze this data set? Is there any kind of relationship between different variables in

More information

Honors Advanced Algebra Unit 3: Polynomial Functions November 9, 2016 Task 11: Characteristics of Polynomial Functions

Honors Advanced Algebra Unit 3: Polynomial Functions November 9, 2016 Task 11: Characteristics of Polynomial Functions Honors Advanced Algebra Name Unit 3: Polynomial Functions November 9, 2016 Task 11: Characteristics of Polynomial Functions MGSE9 12.F.IF.7 Graph functions expressed symbolically and show key features

More information

ENGG2430A-Homework 2

ENGG2430A-Homework 2 ENGG3A-Homework Due on Feb 9th,. Independence vs correlation a For each of the following cases, compute the marginal pmfs from the joint pmfs. Explain whether the random variables X and Y are independent,

More information

Statistics Assignment 2 HET551 Design and Development Project 1

Statistics Assignment 2 HET551 Design and Development Project 1 Statistics Assignment HET Design and Development Project Michael Allwright - 74634 Haddon O Neill 7396 Monday, 3 June Simple Stochastic Processes Mean, Variance and Covariance Derivation The following

More information

Preliminary Statistics. Lecture 3: Probability Models and Distributions

Preliminary Statistics. Lecture 3: Probability Models and Distributions Preliminary Statistics Lecture 3: Probability Models and Distributions Rory Macqueen (rm43@soas.ac.uk), September 2015 Outline Revision of Lecture 2 Probability Density Functions Cumulative Distribution

More information

Section Linear Correlation and Regression. Copyright 2013, 2010, 2007, Pearson, Education, Inc.

Section Linear Correlation and Regression. Copyright 2013, 2010, 2007, Pearson, Education, Inc. Section 13.7 Linear Correlation and Regression What You Will Learn Linear Correlation Scatter Diagram Linear Regression Least Squares Line 13.7-2 Linear Correlation Linear correlation is used to determine

More information

Outline Properties of Covariance Quantifying Dependence Models for Joint Distributions Lab 4. Week 8 Jointly Distributed Random Variables Part II

Outline Properties of Covariance Quantifying Dependence Models for Joint Distributions Lab 4. Week 8 Jointly Distributed Random Variables Part II Week 8 Jointly Distributed Random Variables Part II Week 8 Objectives 1 The connection between the covariance of two variables and the nature of their dependence is given. 2 Pearson s correlation coefficient

More information

Lecture 16 - Correlation and Regression

Lecture 16 - Correlation and Regression Lecture 16 - Correlation and Regression Statistics 102 Colin Rundel April 1, 2013 Modeling numerical variables Modeling numerical variables So far we have worked with single numerical and categorical variables,

More information

Department of Mathematics, University of Wisconsin-Madison Math 114 Worksheet Sections 3.1, 3.3, and 3.5

Department of Mathematics, University of Wisconsin-Madison Math 114 Worksheet Sections 3.1, 3.3, and 3.5 Department of Mathematics, University of Wisconsin-Madison Math 11 Worksheet Sections 3.1, 3.3, and 3.5 1. For f(x) = 5x + (a) Determine the slope and the y-intercept. f(x) = 5x + is of the form y = mx

More information

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y Regression and correlation Correlation & Regression, I 9.07 4/1/004 Involve bivariate, paired data, X & Y Height & weight measured for the same individual IQ & exam scores for each individual Height of

More information

In this course, we will be examining large data sets coming from different sources. There are two basic types of data:

In this course, we will be examining large data sets coming from different sources. There are two basic types of data: Chapter 4 Statistics In this course, we will be examining large data sets coming from different sources. There are two basic types of data: Deterministic Data: Data coming from a specific function- each

More information

COMP 562: Introduction to Machine Learning

COMP 562: Introduction to Machine Learning COMP 562: Introduction to Machine Learning Lecture 20 : Support Vector Machines, Kernels Mahmoud Mostapha 1 Department of Computer Science University of North Carolina at Chapel Hill mahmoudm@cs.unc.edu

More information