j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Similar documents
Lecture 3: Probability Distributions

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

β0 + β1xi and want to estimate the unknown

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Probability and Random Variable Primer

Expected Value and Variance

Probability Theory (revisited)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Multiple Choice. Choose the one that best completes the statement or answers the question.

First Year Examination Department of Statistics, University of Florida

Properties of Least Squares

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Introduction to Random Variables

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

A random variable is a function which associates a real number to each element of the sample space

β0 + β1xi. You are interested in estimating the unknown parameters β

Effects of Ignoring Correlations When Computing Sample Chi-Square. John W. Fowler February 26, 2012

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Convergence of random processes

Linear Regression Analysis: Terminology and Notation

Limited Dependent Variables

Composite Hypotheses testing

β0 + β1xi. You are interested in estimating the unknown parameters β

7. Multivariate Probability

/ n ) are compared. The logic is: if the two

PROBABILITY PRIMER. Exercise Solutions

Statistics and Probability Theory in Civil, Surveying and Environmental Engineering

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

Chapter 4: Regression With One Regressor

APPENDIX A Some Linear Algebra

Chat eld, C. and A.J.Collins, Introduction to multivariate analysis. Chapman & Hall, 1980

A be a probability space. A random vector

Chapter 1. Probability

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

PES 1120 Spring 2014, Spendier Lecture 6/Page 1

On mutual information estimation for mixed-pair random variables

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

CS-433: Simulation and Modeling Modeling and Probability Review

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

Chapter 11: Simple Linear Regression and Correlation

e i is a random error

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

SELECTED PROOFS. DeMorgan s formulas: The first one is clear from Venn diagram, or the following truth table:

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities

Marginal Effects in Probit Models: Interpretation and Testing. 1. Interpreting Probit Coefficients

9.07 Introduction to Probability and Statistics for Brain and Cognitive Sciences Emery N. Brown

x = , so that calculated

Economics 130. Lecture 4 Simple Linear Regression Continued

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

Chapter 9: Statistical Inference and the Relationship between Two Variables

Linear Correlation. Many research issues are pursued with nonexperimental studies that seek to establish relationships among 2 or more variables

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Outline. Zero Conditional mean. I. Motivation. 3. Multiple Regression Analysis: Estimation. Read Wooldridge (2013), Chapter 3.

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

MATH 281A: Homework #6

Engineering Risk Benefit Analysis

Lecture 12: Discrete Laplacian

7. Multivariate Probability

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Statistics Spring MIT Department of Nuclear Engineering

6. Stochastic processes (2)

Strong Markov property: Same assertion holds for stopping times τ.

6. Stochastic processes (2)

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

DS-GA 1002 Lecture notes 5 Fall Random processes

Multi-dimensional Central Limit Theorem

The Ordinary Least Squares (OLS) Estimator

Homework 9 for BST 631: Statistical Theory I Problems, 11/02/2006

Gaussian Mixture Models

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

However, since P is a symmetric idempotent matrix, of P are either 0 or 1 [Eigen-values

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

Statistics Chapter 4

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Lecture 4 Hypothesis Testing

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

1.4. Experiments, Outcome, Sample Space, Events, and Random Variables

MATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1

Midterm Examination. Regression and Forecasting Models

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

CS 798: Homework Assignment 2 (Probability)

The Geometry of Logit and Probit

10-801: Advanced Optimization and Randomized Methods Lecture 2: Convex functions (Jan 15, 2014)

PhysicsAndMathsTutor.com

Differentiating Gaussian Processes

PHYS 705: Classical Mechanics. Calculus of Variations II

Lecture 6 More on Complete Randomized Block Design (RBD)

Simulation and Random Number Generation

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Hidden Markov Models

Transcription:

Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons Law of terated expectatons The Normal dstrbuton Appendx I: Correlaton, ndependence and lnear relatonshps Appendx II: Covarance and ndependence. Random varables a. Random varables:,, Z take on dfferent values wth dfferent probabltes; conventon s to use captal letters for random varables and lower case letters for realzed values. So, for nstance, s a random varable, and x or x, x, and x 3 would be specfc realzed values of b. (Probablty) Densty functons (pdfs): descrbe the dstrbuton of the random varable the probablty that the random varable takes on dfferent values used to determne probabltes. Dscrete random varable (e.g. Bnomal dstrbuton): takes on a fnte or countably nfnte set of values wth postve probablty. densty functon: f ( x j) P( x j) 0 and f ( x j) (note sgma notaton). Contnuous random varable (e.g. Normal dstrbuton). densty functon: f ( x) 0 and f ( x) dx. Use the densty functons to determne the probabltes:. Dscrete: Pa ( < b) P ( x) f( x) a< x b a< x b b. Contnuous: P( a < b) f ( x) dx a

c. Examples of random varables. Unform [a,b]: f( x) x [ ab, ] b a and s 0 otherwse. Standard Normal - N(0,): f( x) e π x. Measures of central tendences and varablty a. Expectaton/Mean (measure of central tendency): E ( ), µ. The average value of (observed wth a large number of random samples from the dstrbuton). A weghted average of the dfferent values of (weght the values by ther respectve probabltes). Dscrete: E( ) µ xp( x) x f( x). Contnuous: E( ) µ xf ( x) dx. Propertes. Lnear operator: E( a + b) ae( ) + b a. Extends to many random varables: E( a) Ea ( ) ae ( ) aµ. And for some functon g(.), Eg ( ( )) gx ( ) f( x) or g( x) f ( x) dx for a contnuous dstrbuton b. Varance (measure of varablty or dsperson around the mean): Var( ),. The average squared devaton of from ts mean (observed wth a large number of random samples from the dstrbuton). A weghted average of the dfferent squared devatons of from ts mean (weght the squared devatons by ther respectve probabltes). Dscrete: µ ( µ ) ( µ ) Var( ) E( ) x P( x ) x f ( x ) Var( ) E( µ ) x µ f ( x) dx. Contnuous: ( ) 3. E ( µ ) E ( ) µ. Propertes:. Not a lnear operator: Var a b a Var ( + ) ( ) v. Standard devaton (StdDev): (postve square root)

. Lnear operator: f a>0, then StdDev a + b Var a a Var a StdDev ( ) ( ) ( ) ( ) µ c. Standardzng random varables (z-scores): Z (has mean zero and unt varance) µ ( ) ( ) 0. Mean: EZ E ( E µ ). Varance: Var Z E Z Var ( ) ( ) ( ) 3. Jont densty functons a. Consder and, two random varables (e.g. people are randomly drawn from a populaton and ther heghts and weghts are recorded) b. If dscrete, then the jont densty s defned by f ( xy, ) P ( x& y) c. Note that P ( x) f ( x) P ( x& y) f ( xy, ). y y. So, the margnal densty P ( x) f ( x) s the sum over the jont denstes f ( xy, ). y d. Here s an example.. In the followng table, the random varable takes on three values (x, x and x3), and takes on two (y and y). The fgures n the box are the jont probabltes, f ( xy, ) P ( x& y). And so, for example, f ( x, y) P ( x & y)... And the margnal probabltes can be recovered from the jont probabltes by just summng across the rows and columns. So, for example, P ( x) f ( x) P ( x & yj) j, f ( x, y) + f ( x, y). +..4. y y x 0. 0. 0.4 P(x) x 0. 0.3 0.4 P(x) x3 0. 0. 0. P(x3) 0.4 0.6 P(y) P(y) 3

e. Independence. f, ( x, y) P( xp ) ( y) f ( x) f ( y) for all values of and, (x,y) the jont densty functon s the product of the margnal denstes (apples to dscrete and contnuous dstrbutons). and n the prevous example are not ndependent, snce, for example: ( )( ) f, ( x, y). P( x) P ( y) f( x) f( y).4.4.6. We can extend to many ndependent random varables: f ( x, x,, x) P ( x, x,, x),,, n n n n n f ( ) ( ) ( ) ( ) x f x f x n n f x. Not ndependent means dependent 4. Measures of assocaton a. Consder two random varables, and. b. Covarance: Cov(, ) E( µ )( µ ) ( x µ )( y µ ) f ( x, y) c. Some examples: and both have mean 0 n the followng examples. On the left, most of the data are n quadrants I and III, where ( x µ )( y µ ) > 0, and so when you sum those products you get a postve covarance. Most of the acton on the rght s n quadrants II and IV where ( x µ )( y µ ) < 0, and so those products sum to a negatve covarance. d. Propertes:. Cov(, ) E( ) µ µ. Note that Cov(, ) E( µ )( µ ) Var( ) 4

. Measures the extent to whch there s a lnear relatonshp between and v. If Cov(, ) > 0 then as llustrated above, and tend to move together n a postve drecton, so that ncreases s are on average assocated wth ncreases n and f the covarance s negatve, then they tend to move n opposte drectons v. If and are ndependent, then Cov(, ) 0. Opposte need not hold 0 does not necessarly mply ndependence t could just mean that there s a hghly non-lnear relatonshp between and.. Here s an example of & havng zero covarance, but not beng ndependent: Jont & Margnal Denstes Cov Contrbutons 0 0 - - 0.33 0.33 E() - 0.67 (0.33) 0 0.33-0.33 0 0 - - - 0.33 0.33 (0.67) 0.33 0.33 0.67 Cov(,) 0.0000 E() 0.67 v. Cov( a + b, c + d ) bdcov(, ) the magntude of the covarance s never greater than the product of the magntudes of the standard devatons (ths s an nstance of the Cauchy-Schwartz Inequalty) v. e. Varances of sums of random varables. Var( + ) + Cov(, ) +. More generally: Var( a + a ) a + a a Cov(, ) + a. So f Cov(, ) 0 (so that and are uncorrelated), then Var( + ) Var( ) + Var( ) (the varance of the sum s the sum of the varances) v. And even more generally:. n n n Var a a a Cov(, ) note that when j, the term s j j j (, ) a Cov a. If the s are parwse uncorrelated, then Cov(, j) 0 when j, and so n n n n n n ths case, Var a aajcov(, j) aacov(, ) a j 5

f. Correlaton: a. If they are parwse uncorrelated, then the varance of the sum s the sum of the varances. Cov(, ) Corr(, ) ρ StdDev( ) StdDev( ). so ρ. And smlar to above:. If Cov(, ) 0, then ρ 0.. If and are ndependent, then they are uncorrelated and ρ 0 3. ρ captures the extent to whch there s a lnear relatonshp between and whch s smlar to, though not the same as, the extent to whch they move together 4. If a + b, then. Propertes: avar( ) a or a a Cov(, ) Corr(, ) ρ StdDev( ) StdDev( ) and so f and are lnearly related they have a correlaton of + or -.. Corr a b a b Corr aa < 0 ( +, + ) (, ) f aa > 0, and Corr(, ) f. So lnear transformatons of random varables may affect the sgn of the correlaton, but not the magntude. 5. Interestng result a. Suppose that the random varable s a lnear functon of another random varable plus an addtve random error U, whch s uncorrelated wth, then:. a + b + U, where, and U are all random varables and Cov(, U ) 0. Cov(, ) Cov(, a + b + U ) Cov(, a) + bcov(, ) + Cov(, U ). Snce Cov(, a) Cov(, U ) 0, Cov(, ) bcov(, ) Cov(, ) StdDev( ) v. or b ρ Corr(, ) Cov(, ) StdDev( ). Ths s a relatonshp that wll haunt you throughout the semester. 6

6. Condtonal dstrbutons a. Recall the defnton of condtonal probabltes: suggest that P ( y x) P ( y& x) P ( x) PA ( B) PA ( B), whch mght PB ( ) f, ( xy, ) b. If dscrete, then f ( y x) P ( y x) f ( x) same formula apples to contnuous dstrbutons. Dvdng by f ( x ) effectvely scales up the margnal denstes. and ensures that you have a vald densty functon, snce f, ( xy, ) f ( x) f ( y x) dy dy f, ( x, y) dy f ( x) f ( x). f ( x) c. If and are ndependent then the condtonal dstrbutons and margnal dstrbutons are the same. f ( y x) f ( y) and f ( x y) f ( x). In words: If and are ndependent than knowng the partcular value of, y, tells you nothng new about, and vce-versa d. Condtonal expectatons and varances. The expected value of condtonal on beng a certan value as the value of changes, the condtonal expectaton of gven x may also change. E ( x) E ( x) yp j ( yj x) yjf ( yj x). If and are ndependent, then E ( x) E ( ) knowng the value of doesn t change the expected value of. Condtonal varances are smlarly defned the expected squared devaton from the condtonal mean:. Var ( x) E( [ E ( x) ] x) E ( x) ( E ( x) ) 7

7. Law of Iterated Expectatons { x } a. E[ g(, ) ] E E [ g( x, ) x] snce Eg (, ) g x, yj P ( x & yj) and. [ ] ( ) x yj E{ Ex g( x, ) x } g( x, yj) P( yj x) P( x) x yj. [ ] ( ) ( ) g x, y P( y x ) P( x ) g x, y P( x & y ) j j j j x yj x yj b. Ths obvously holds for contnuous random varables as well. Ex g( x, ) x k for some constant k. so that condtonal on x (or the x s), the expected value of g( x, ) s some constant k. And because that expectaton s always k, for any x, the overall expectaton of (, ) Eg (, ) k. c. Why ths s so useful? In many cases, we wll show that [ ] g must be k as well: [ ] d. For example: We wll show that under certan assumptons, and condtonal on the x s, the OLS estmator s an unbased estmator, so that t s expectaton, condtonal on the x s, s n the fact the true parameter value. But snce ths holds for any set of x s, t must also be true overall. And so n ths case, we can just say that the OLS estmator s an unbased estmator, and drop the condtonal on the x s. 8. The Normal dstrbuton a. Standard Normal (Gaussan): ( µ, ) b. If s Ν ( µ, ) c. Propertes:, then Z Ν has mean µ and varance µ s ( 0,). If s Ν ( µ, ) then a + b Ν ( aµ + b, a ). If and + Ν( µ, ) 8 Ν (the Standard Normal dstrbuton) are ndependent wth the same dstrbuton, Ν ( µ, ) + Ν( µ, ).. Ths mples that ( ), then. More generally, assume that n random varables (,, n ) are ndependently and dentcally dstrbuted Ν ( µ, ), then Ν( n µ, n ) and Ν( µ, ). n n

v. s a specfc form of the more general weghted average α, n where 0 α for all and α.. wll have mean αµ µ α µ. and varance α α, and wll be Normally dstrbuted. 9. Appendx I - Correlaton and Lnear Relatonshps: ρ P ( β0 + β) a. Lnear mples a correlaton of + or -. Suppose that β0 + β and β 0.. Then cov(, ) cov(, β0 + β ) E(( µ )( β0 + β β0 β µ )) β E(( µ ) ) β var( ).. And snce var( ) E(( β0 + β β0 β µ ) ) β E(( µ ) ) β var( ), the correlaton of and s: cov(, ) β var( ) β ρ + or dependng on the var( ) var( ) var( ) β var( ) β sgn of β 0. b. Non-lnear mples correlaton not + or - here s an example:. Suppose that U ( β0 + β), where µ U 0 and cov(, U ) 0, but var( U ) U 0 (so we don t have a perfectly lnear relatonshp between and ).. Then cov(, ) cov(, β0 + β + U) E(( µ )( β0 + β + U β0 β µ )).. And snce var( ) E(( β0 + β + U β0 βµ ) ) β var( ) + U, the correlaton of β E(( µ ) ) + β cov( U, ) + var( U) and s: ρ v. Snce cov(, ) β var( ) var( ) var( ) var( ) var( ) var( ) 0 ( β + U ) U U, the denomnator wll be larger n magntude than the numerator and so ρ <. v. Notce that f 0, then we have a lnear relatonshp, and as above U ρ + or.. 9

0. Appendx II: Covarance and ndependence Not Independent! Independent! ^ margnal marg 0 0.5 for 0 0.5-0% 0% 0% 0% - 4% 8% 8% 0% -0.5 0% 0% 0% 0% -0.5 4% 8% 8% 0% 0 0% 0% 0% 0% 0 4% 8% 8% 0% 0.5 0% 0% 0% 0% 0.5 4% 8% 8% 0% 0% 0% 0% 0% 4% 8% 8% 0% margnal for 0% 40% 40% marg 0% 40% 40% Indep! Covarance calculaton prob 0% - 0% -0.5 0.5 0% 0 0 0% 0.5 0.5 0% mean 0 0.5 varance 0.65 0.88 covarance 0 covar 0 -mu -mu product - 0.5-0.5-0.5-0.5 0.5 0-0.5 0 0.5-0.5-0.5 0.5 0.5 0