ECON 7335 INFORMATION, LEARNING AND EXPECTATIONS IN MACRO LECTURE 1: BASICS. 1. Bayes Rule. p(b j A)p(A) p(b)

Similar documents
Math 413/513 Chapter 6 (from Friedberg, Insel, & Spence)

INTRODUCTION TO INFORMATION THEORY

Projection Theorem 1

Functional Analysis: Assignment Set # 3 Spring 2009 Professor: Fengbo Hang February 25, 2009

Math 702 Problem Set 10 Solutions

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit

ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER. The Kalman Filter. We will be concerned with state space systems of the form

Continuous Random Variables

SIGNALS, BELIEFS AND UNUSUAL EVENTS ERID LECTURE DUKE UNIVERSITY

Measure-theoretic probability

Intro to Probability. Andrei Barbu

CS 630 Basic Probability and Information Theory. Tim Campbell

Stochastic Processes

Metric Spaces. DEF. If (X; d) is a metric space and E is a nonempty subset, then (E; d) is also a metric space, called a subspace of X:

Bayesian Methods for DSGE models Lecture 1 Macro models as data generating processes

Least Squares Regression

Econ 325: Introduction to Empirical Economics

Review (Probability & Linear Algebra)

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Least Squares Regression

2 Interval-valued Probability Measures

Statistical Learning Theory

Mean-Variance Utility

4 Derivations of the Discrete-Time Kalman Filter

The Geometric Meaning of the Notion of Joint Unpredictability of a Bivariate VAR(1) Stochastic Process

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables.

ECON 4117/5111 Mathematical Economics

Single Maths B: Introduction to Probability

MULTIVARIATE PROBABILITY DISTRIBUTIONS

UNIT-2: MULTIPLE RANDOM VARIABLES & OPERATIONS

Probability theory. References:

Probability Theory Review Reading Assignments

Introduction to Linear Algebra. Tyrone L. Vincent

Stochastic integral. Introduction. Ito integral. References. Appendices Stochastic Calculus I. Geneviève Gauthier.

Lecture 22: Variance and Covariance

Near convexity, metric convexity, and convexity

Introduction to Probability and Stocastic Processes - Part I

Fall TMA4145 Linear Methods. Exercise set 10

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Lecture Notes 1: Vector spaces

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Bivariate distributions

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Introduction to Bayesian Learning. Machine Learning Fall 2018

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

RSMG Working Paper Series. TITLE: The value of information and the value of awareness. Author: John Quiggin. Working Paper: R13_2

Topics in Mathematical Economics. Atsushi Kajii Kyoto University

Math Real Analysis II

Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation

4.3 - Linear Combinations and Independence of Vectors

Stable Process. 2. Multivariate Stable Distributions. July, 2006

DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Machine learning - HT Maximum Likelihood

Implicit Function Theorem: One Equation

ECSE B Solutions to Assignment 8 Fall 2008

Topics in Mathematical Economics. Atsushi Kajii Kyoto University

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Chris Bishop s PRML Ch. 8: Graphical Models

Revisiting independence and stochastic dominance for compound lotteries

Theorem (4.11). Let M be a closed subspace of a Hilbert space H. For any x, y E, the parallelogram law applied to x/2 and y/2 gives.

1. Subspaces A subset M of Hilbert space H is a subspace of it is closed under the operation of forming linear combinations;i.e.,

STA 256: Statistics and Probability I

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Computational Genomics

Bayesian Approaches Data Mining Selected Technique

Computational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept

If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s.

Economics 620, Lecture 8: Asymptotics I

a b = a T b = a i b i (1) i=1 (Geometric definition) The dot product of two Euclidean vectors a and b is defined by a b = a b cos(θ a,b ) (2)

Introduction to Probability and Statistics (Continued)

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture 4 Bayes Theorem

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Lecture 4 Bayes Theorem

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

LECTURE 12 UNIT ROOT, WEAK CONVERGENCE, FUNCTIONAL CLT

CS229 Lecture notes. Andrew Ng

Data Mining Techniques

The Normal Linear Regression Model with Natural Conjugate Prior. March 7, 2016

Bayesian RL Seminar. Chris Mansley September 9, 2008

Lecture 2: Repetition of probability theory and statistics

Notes on Time Series Modeling

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

The main purpose of this chapter is to prove the rst and second fundamental theorem of asset pricing in a so called nite market model.

5. Conditional Distributions

Hilbert Spaces: Infinite-Dimensional Vector Spaces

Lecture 11. Probability Theory: an Overveiw

Corrections to Theory of Asset Pricing (2008), Pearson, Boston, MA

Gaussian Processes. 1. Basic Notions

Expectation Propagation Algorithm

Topics in Probability and Statistics

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Definition and basic properties of heat kernels I, An introduction

Chapter 2. Probability

SDS 321: Introduction to Probability and Statistics

Bayesian Machine Learning

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Transcription:

ECON 7335 INFORMATION, LEARNING AND EXPECTATIONS IN MACRO LECTURE : BASICS KRISTOFFER P. NIMARK. Bayes Rule De nition. Bayes Rule. The probability of event A occurring conditional on the event B having occurred is given by p(a j B) = p(b j A)p(A) p(b) (.) as long as p(b) 6= 0. It can be derived from the de nition of conditional probability p(a j B) p (A \ B) p(b) (.) and that the probability of A and B occurring is the same as the probability of B and A occurring implying p(a j B)p(B) = p(b j A)p(A) (.3) which can be rearranged to yield Bayes Rule. Bayes Rule is general and applies whether A and B are discrete or continuous random variables. It can be used to update a prior belief p(x) about the latent variable x conditional on the signal y: Bayes Rule then gives the posterior belief p (x j y) = p(y j x)p(x) p(y) (.4) Date: January 7, 06.

KRISTOFFER P. NIMARK.. Example with binary variables and signals (adapted from O Hara 995). Consider a market maker that holds inventory of an asset that can have either a High or Low value. The market maker s prior belief that the value of the asset is high is denoted p(h) = : There are two types of traders, informed and uninformed and the market maker is equally likely to meet either type. An informed trader always buys when the value is high and always sells when the value is low. An uninformed trader is equally likely to buy or sell. What is the posterior probability that the value is high if the rst trade is a sale (S)? Bayes Rule states that p(h j S) = p(s j H)p(H) p(s) Start by nding the probability of a sale conditional on the value being high p(s j H) = 0 + = 4 (.5) and low p(s j L) = + = 3 4 We also need to nd the (unconditional) probability of sale (.6) p(s) = p(s j H)p(H) + p(s j L)p(L) (.7) Plug into Bayes Rule to get the posterior conditional on a sale p (H j S) = = p(s j H)p(H) p(s j H)p(H) + p(s j L)p(L) 4 4 + 3 4 = 4 (.8) (.9)

INFORMATION, LEARNING AND EXPECTATIONS IN MACRO 3 The same steps can be used to nd the posterior probability that the value is high if the rst transaction is a buy (B) p (H j B) = = p(b j H)p(H) p(b j H)p(H) + p(b j L)p(L) 3 4 4 + 3 4 = 3 4 (.0) (.)... Updating beliefs when new information arrives. If the arrival of traders is independent over time, it is straighforward to update the posterior after a second transaction takes place by simply using the posterior after the rst observation as the prior in the update. If both the rst and second transaction are sales, the posterior probability is then given by p (H j S; S) = 4 4 4 4 + 3 4 3 4 = 0 (.) However, if the rst transaction is a sale and the second a buy, we get p (H j S; B) = 3 4 4 3 4 4 + 4 3 4 = (.3) i.e. the second signal cancels the rst and psoterior equals the original prior belief. Useful fact: Under quite general conditions, beliefs that are updated using Bayes Rule converge almost surely to the truth... Bayes Rule and jointly normally distributed variables. To illustrate the usefulness of Bayes Rule, we can apply it to a simple bivariate setting. Let the prior about the latent variable x be normally distributed x N(0; x) (.4) and the signal y be the sum of the true x plus a normally distributed noise term so that y = x + : N(0; ) (.5)

4 KRISTOFFER P. NIMARK We then have all the ingredients need to apply Bayes Rule to nd the posterior p(x j y): p(x) = p(y) = p e x x (.6) x p p x + e (x+) ( x + ) (.7) p(y j x) = p e (y x) (.8) Plugging these expressions into Bayes Rule gives p(x j y) = p e (y x) p p e x + p x e x x (x+) ( x + ) (.9) which can be simpli ed to p(x j y) = p x + e x! x ( x + ) y + x! (.0) The posterior distribution p(x j y) is thus normally distributed with mean x y and variance + : This illustrates two points: First, normally distributed priors combined ( x+ ) x with normally distributed noise in the signal result in normally distributed posteriors. This is an extremely useful property of normal distributions. Second, the posterior variance of x is lower than the prior variance, i.e. x + More information thus reduces uncertainty about x. < x (.)

INFORMATION, LEARNING AND EXPECTATIONS IN MACRO 5. The Projection Theorem This section explains how orthogonal projections can be used to nd least squares predictions of random variables. We start by de ning some concepts needed for stating the projection theorem. For more details about the projection theorem, see for instance Chapter of Brockwell and Davis (006) or Chapter 3 in Luenberger (969). De nition. (Inner Product Space) A real vector space H is said to be an inner product space if for each pair of elements x and y in H there is a number hx; yi called the inner product of x and y such that hx; yi = hy; xi (.) hx + y; zi = hx; zi + hy; zi for all x; y; z H (.) hx; yi = hx; yi for all x; y H and R (.3) hx; xi 0 for all x H (.4) hx; xi = 0 if and only if x = 0 (.5) De nition 3. (Norm) The norm of an element x of an inner product space is de ned to be kxk = p hx; xi (.6) De nition 4. (Cauchy Sequence) A sequence fx n ; n = ; ; :::g of elements of an inner product space is said to be Cauchy sequence if kx n x m k! 0 as m; n! i.e. for every " > 0 there exists a positive integer N(") such that kx n x m k < " as m; n > N(")

6 KRISTOFFER P. NIMARK De nition 5. (Hilbert Space) A Hilbert space H is an inner product space which is complete, i.e. every Cauchy sequence fx n g converges in norm to some element x H: Theorem. (The Projection Theorem) If M is a closed subspace of the Hilbert Space H and x H; then (i) there is a unique element bx M such that kx bxk = inf ym kx yk and (ii) bx M and kx bxk = inf ym kx yk if and only if bx M and (x bx) M? where M? is the orthogonal complement to M in H: The element bx is called the orthogonal projection of x onto M: Proof. We rst show that if bx is a minimizing vector then x bx must be orthogonal to M: Suppose to the contrary that there is an element m M which is not orthogonal to the error x bx: Without loss of generality we may assume that kmk = and that hx bx; mi = 6= 0: De ne the vector m M m bx + m (.7) We then have that kx m k = kx bx mk (.8) = kx bxk hx bx; mi hm; x bxi + jj (.9) = kx bxk jj (.0) < kx bxk (.) where the second line follows from (.) and the de nition of the norm, the third line comes from the fact that hx bx; mi = hm; x bxi = jj : The inequality on the last line follows

INFORMATION, LEARNING AND EXPECTATIONS IN MACRO 7 from the fact that jj > 0: We then have a contradiction: bx cannot be the element in M that minimizes the norm of the error if 6= 0 since kx m k then is smaller than kx bxk : We now show that if x bx is orthogonal to M then it is the unique minimizing vector. For any m M we have that kx mk = kx bx + bx mk (.) = kx bxk + kbx mk (.3) > kx bxk for bx 6= m (.4) Properties of Projection Mappings. Let H be a Hilbert space and and let P M be a projection mapping onto a closed subspace M: Then (i) each x H has a unique representation as a sum of an element in M and an element in M? ; i.e. x = P M x + (I P M )x (.5) (ii) x M if and only if P M x = x (iii) x M? if and only if P M x = 0 (iv) M M if and only if P M P M x = P M (v) kxk = kp M xk + k(i P M ) xk The de nitions and the proofs above refer to Hilbert spaces in general. We now de ne the space relevant for most of time series analysis. De nition 6. (The space L (; F; P ) ) We can de ne the space L (; F; P ) as the space consisting of all collections C of random variables X de ned on the probability space (; F; P ) satisfying the condition Z EX = X(!)P (d!) < (.6)

8 KRISTOFFER P. NIMARK and de ne the inner product of this space as hx; Y i = E (XY ) for any X; Y C (.7) Least squares estimation via the projection theorem. The inner product space L satis es all of the axioms above. Noting that the inner product de nition (.7) corresponds to a covariance means that we can use the projection theorem to nd the minimum variance estimate of a vector of random variables with nite variances as a function of some other random variables with nite variances. That is, both the information set and the variables we are trying to predict must be elements of the relevant space and since hx; Y i = E (XY ) implies that an estimate bx that minimizes the norm of the estimation error kx bxk also minimizes the variance since kx bxk = q E (x bx) (x bx) 0 (.8) To nd the estimate bx as a linear function of y simply use that hx y; yi = E [(x y) y 0 ] (.9) = 0 and solve for = E (xy 0 ) [E (yy 0 )] (.0) The advantage of this approach is that once you have made sure that the variables y and x are in a well de ned inner product space, there is no need to minimize the variance directly. The projection theorem ensures that an estimate with orthogonal errors is the (linear) minimum variance estimate. Two useful properties of linear projections.

INFORMATION, LEARNING AND EXPECTATIONS IN MACRO 9 If two random variables X and Y are Gaussian, then the projection of Y onto X coincides withe the conditional expectation E(Y j X): If X and Y are not Gaussian, the linear projection of Y onto X is the minimum variance linear prediction of Y given X. References [] Brockwell, P.J. and R.A. Davis, 006, Time Series: Theory and Methods, Springer-Verlag. [] Luenberger, D., (969), Optimization by Vector Space Methods, John Wiley and Sons, Inc., New York. [3] O Hara, M., 995. Market microstructure theory, Blackwell, Cambridge, MA.