ECON 7335 INFORMATION, LEARNING AND EXPECTATIONS IN MACRO LECTURE : BASICS KRISTOFFER P. NIMARK. Bayes Rule De nition. Bayes Rule. The probability of event A occurring conditional on the event B having occurred is given by p(a j B) = p(b j A)p(A) p(b) (.) as long as p(b) 6= 0. It can be derived from the de nition of conditional probability p(a j B) p (A \ B) p(b) (.) and that the probability of A and B occurring is the same as the probability of B and A occurring implying p(a j B)p(B) = p(b j A)p(A) (.3) which can be rearranged to yield Bayes Rule. Bayes Rule is general and applies whether A and B are discrete or continuous random variables. It can be used to update a prior belief p(x) about the latent variable x conditional on the signal y: Bayes Rule then gives the posterior belief p (x j y) = p(y j x)p(x) p(y) (.4) Date: January 7, 06.
KRISTOFFER P. NIMARK.. Example with binary variables and signals (adapted from O Hara 995). Consider a market maker that holds inventory of an asset that can have either a High or Low value. The market maker s prior belief that the value of the asset is high is denoted p(h) = : There are two types of traders, informed and uninformed and the market maker is equally likely to meet either type. An informed trader always buys when the value is high and always sells when the value is low. An uninformed trader is equally likely to buy or sell. What is the posterior probability that the value is high if the rst trade is a sale (S)? Bayes Rule states that p(h j S) = p(s j H)p(H) p(s) Start by nding the probability of a sale conditional on the value being high p(s j H) = 0 + = 4 (.5) and low p(s j L) = + = 3 4 We also need to nd the (unconditional) probability of sale (.6) p(s) = p(s j H)p(H) + p(s j L)p(L) (.7) Plug into Bayes Rule to get the posterior conditional on a sale p (H j S) = = p(s j H)p(H) p(s j H)p(H) + p(s j L)p(L) 4 4 + 3 4 = 4 (.8) (.9)
INFORMATION, LEARNING AND EXPECTATIONS IN MACRO 3 The same steps can be used to nd the posterior probability that the value is high if the rst transaction is a buy (B) p (H j B) = = p(b j H)p(H) p(b j H)p(H) + p(b j L)p(L) 3 4 4 + 3 4 = 3 4 (.0) (.)... Updating beliefs when new information arrives. If the arrival of traders is independent over time, it is straighforward to update the posterior after a second transaction takes place by simply using the posterior after the rst observation as the prior in the update. If both the rst and second transaction are sales, the posterior probability is then given by p (H j S; S) = 4 4 4 4 + 3 4 3 4 = 0 (.) However, if the rst transaction is a sale and the second a buy, we get p (H j S; B) = 3 4 4 3 4 4 + 4 3 4 = (.3) i.e. the second signal cancels the rst and psoterior equals the original prior belief. Useful fact: Under quite general conditions, beliefs that are updated using Bayes Rule converge almost surely to the truth... Bayes Rule and jointly normally distributed variables. To illustrate the usefulness of Bayes Rule, we can apply it to a simple bivariate setting. Let the prior about the latent variable x be normally distributed x N(0; x) (.4) and the signal y be the sum of the true x plus a normally distributed noise term so that y = x + : N(0; ) (.5)
4 KRISTOFFER P. NIMARK We then have all the ingredients need to apply Bayes Rule to nd the posterior p(x j y): p(x) = p(y) = p e x x (.6) x p p x + e (x+) ( x + ) (.7) p(y j x) = p e (y x) (.8) Plugging these expressions into Bayes Rule gives p(x j y) = p e (y x) p p e x + p x e x x (x+) ( x + ) (.9) which can be simpli ed to p(x j y) = p x + e x! x ( x + ) y + x! (.0) The posterior distribution p(x j y) is thus normally distributed with mean x y and variance + : This illustrates two points: First, normally distributed priors combined ( x+ ) x with normally distributed noise in the signal result in normally distributed posteriors. This is an extremely useful property of normal distributions. Second, the posterior variance of x is lower than the prior variance, i.e. x + More information thus reduces uncertainty about x. < x (.)
INFORMATION, LEARNING AND EXPECTATIONS IN MACRO 5. The Projection Theorem This section explains how orthogonal projections can be used to nd least squares predictions of random variables. We start by de ning some concepts needed for stating the projection theorem. For more details about the projection theorem, see for instance Chapter of Brockwell and Davis (006) or Chapter 3 in Luenberger (969). De nition. (Inner Product Space) A real vector space H is said to be an inner product space if for each pair of elements x and y in H there is a number hx; yi called the inner product of x and y such that hx; yi = hy; xi (.) hx + y; zi = hx; zi + hy; zi for all x; y; z H (.) hx; yi = hx; yi for all x; y H and R (.3) hx; xi 0 for all x H (.4) hx; xi = 0 if and only if x = 0 (.5) De nition 3. (Norm) The norm of an element x of an inner product space is de ned to be kxk = p hx; xi (.6) De nition 4. (Cauchy Sequence) A sequence fx n ; n = ; ; :::g of elements of an inner product space is said to be Cauchy sequence if kx n x m k! 0 as m; n! i.e. for every " > 0 there exists a positive integer N(") such that kx n x m k < " as m; n > N(")
6 KRISTOFFER P. NIMARK De nition 5. (Hilbert Space) A Hilbert space H is an inner product space which is complete, i.e. every Cauchy sequence fx n g converges in norm to some element x H: Theorem. (The Projection Theorem) If M is a closed subspace of the Hilbert Space H and x H; then (i) there is a unique element bx M such that kx bxk = inf ym kx yk and (ii) bx M and kx bxk = inf ym kx yk if and only if bx M and (x bx) M? where M? is the orthogonal complement to M in H: The element bx is called the orthogonal projection of x onto M: Proof. We rst show that if bx is a minimizing vector then x bx must be orthogonal to M: Suppose to the contrary that there is an element m M which is not orthogonal to the error x bx: Without loss of generality we may assume that kmk = and that hx bx; mi = 6= 0: De ne the vector m M m bx + m (.7) We then have that kx m k = kx bx mk (.8) = kx bxk hx bx; mi hm; x bxi + jj (.9) = kx bxk jj (.0) < kx bxk (.) where the second line follows from (.) and the de nition of the norm, the third line comes from the fact that hx bx; mi = hm; x bxi = jj : The inequality on the last line follows
INFORMATION, LEARNING AND EXPECTATIONS IN MACRO 7 from the fact that jj > 0: We then have a contradiction: bx cannot be the element in M that minimizes the norm of the error if 6= 0 since kx m k then is smaller than kx bxk : We now show that if x bx is orthogonal to M then it is the unique minimizing vector. For any m M we have that kx mk = kx bx + bx mk (.) = kx bxk + kbx mk (.3) > kx bxk for bx 6= m (.4) Properties of Projection Mappings. Let H be a Hilbert space and and let P M be a projection mapping onto a closed subspace M: Then (i) each x H has a unique representation as a sum of an element in M and an element in M? ; i.e. x = P M x + (I P M )x (.5) (ii) x M if and only if P M x = x (iii) x M? if and only if P M x = 0 (iv) M M if and only if P M P M x = P M (v) kxk = kp M xk + k(i P M ) xk The de nitions and the proofs above refer to Hilbert spaces in general. We now de ne the space relevant for most of time series analysis. De nition 6. (The space L (; F; P ) ) We can de ne the space L (; F; P ) as the space consisting of all collections C of random variables X de ned on the probability space (; F; P ) satisfying the condition Z EX = X(!)P (d!) < (.6)
8 KRISTOFFER P. NIMARK and de ne the inner product of this space as hx; Y i = E (XY ) for any X; Y C (.7) Least squares estimation via the projection theorem. The inner product space L satis es all of the axioms above. Noting that the inner product de nition (.7) corresponds to a covariance means that we can use the projection theorem to nd the minimum variance estimate of a vector of random variables with nite variances as a function of some other random variables with nite variances. That is, both the information set and the variables we are trying to predict must be elements of the relevant space and since hx; Y i = E (XY ) implies that an estimate bx that minimizes the norm of the estimation error kx bxk also minimizes the variance since kx bxk = q E (x bx) (x bx) 0 (.8) To nd the estimate bx as a linear function of y simply use that hx y; yi = E [(x y) y 0 ] (.9) = 0 and solve for = E (xy 0 ) [E (yy 0 )] (.0) The advantage of this approach is that once you have made sure that the variables y and x are in a well de ned inner product space, there is no need to minimize the variance directly. The projection theorem ensures that an estimate with orthogonal errors is the (linear) minimum variance estimate. Two useful properties of linear projections.
INFORMATION, LEARNING AND EXPECTATIONS IN MACRO 9 If two random variables X and Y are Gaussian, then the projection of Y onto X coincides withe the conditional expectation E(Y j X): If X and Y are not Gaussian, the linear projection of Y onto X is the minimum variance linear prediction of Y given X. References [] Brockwell, P.J. and R.A. Davis, 006, Time Series: Theory and Methods, Springer-Verlag. [] Luenberger, D., (969), Optimization by Vector Space Methods, John Wiley and Sons, Inc., New York. [3] O Hara, M., 995. Market microstructure theory, Blackwell, Cambridge, MA.