MSc Quantitative Techniques Mathematics Weeks 1 and 2

MSc Quantitative Techniques Mathematics Weeks 1 and 2 DEPARTMENT OF ECONOMICS, MATHEMATICS AND STATISTICS LONDON WC1E 7HX September 2017 For MSc Programmes in Economics and Finance

Who is this course for? This part of the MSc Quantitative Techniques course reviews basic mathematical techniques for most of our MSc programmes in economics and finance. The course is taught three days a week (Monday, Tuesday and Thursday evenings) over September. Course Aims and Objectives On completing the course successfully, you should be able to understand the basics of sets and functions, including standard mathematical notation; understand the basics of linear algebra and the use of matrices; find the constrained optima of multivariate functions; compute definite and indefinite integrals; solve simple difference and differential equations; use these techniques to solve simple problems. Assessment Performance in this course is assessed through a sequence of in-class tests. Textbooks We do not recommend any particular text, but in the past students have found the following useful. Chiang, Alpha and Kevin Wainwright, Fundamental Methods of Mathematical Economics, McGraw Hill, 4th ed, 2005. Hoy, M., J. Livernois, C. McKenna, R. Rees and T Stengos, Mathematics for Economics, 2nd edition, MIT Press, 2001. Simon, K. and L. Blume, Mathematics for Economists, WW Norton, 1994. 1

Contents 1 Preliminaries 3 1.1 Sets.......................................... 3 1.2 The set of real numbers............................... 4 1.3 Sequences...................................... 7 1.4 Series......................................... 9 1.5 Functions....................................... 10 1.6 Properties of functions............................... 13 1.7 Probability...................................... 14 2 Matrix algebra 23 2.1 Vectors........................................ 23 2.2 Matrices....................................... 26 2.3 Determinants.................................... 32 2.4 Linear independence and rank.......................... 33 2.5 Systems of linear equations............................ 35 2.6 Inverse matrix.................................... 37 2.7 Characteristic roots and vectors.......................... 42 2.8 Matrix representation of quadratic forms.................... 45 Problems.......................................... 48

Chapter 1 Preliminaries 1.1 Sets A set is any well-specified collection of elements. A set can be specified by listing its elements. For example A = {London, Paris, Athens} Or consider B = {1, 3, 5, 7, 9} Alternatively, we can define a rule that determines what is in the set and what is not: B = {x is an odd number: 0 < x < 10} If x is an element of a set S, we say that x belongs to S, written as x S. If z does not belong to S, we write z / S. Cardinality of a set The cardinality of a set refers to the number of elements it has. For example, the set B = {1, 3, 5, 7, 9} has cardinality 5. A set may contain finitely many or infinitely many elements. A set with no elements is called the empty set (or the null set) and is denoted by the symbol. 3

Subsets, unions and intersection Given sets S and T, we say S is a subset of T if every element of S is also an element of T. This is denoted as S T. Given sets A and B, their union A B is the set of elements that are either in A or B. A B = {x : x A or x B} Given sets A and B, their intersection A B is the set of elements that are both in A or B. A B = {x : x A and x B} 1.2 The set of real numbers Numbers such as 1, 2, 3,... are called natural numbers. Integers include zero and negative numbers too:..., 2, 1, 0, 1, 2, 3,.... Numbers that can be expressed as a ratio of two integers that is, of the form a b where a and b are integers, and b = 0 are said to be rational. Numbers such as 2, π, e cannot be expressed as a ratio of integers: they are said to be irrational. The set of real numbers includes both rational and irrational numbers. It is sometimes helpful to think of real numbers as points on a number line. The set of real numbers is usually denoted by R. It is common to use R + to denote the set of non-negative real numbers, and R ++ for strictly positive real numbers. 1.2.1 Inequalities Given any two real numbers a and b, there are three mutually exclusive possibilities a > b (a is greater than b) a < b (a is less than b) a = b (a is equal to b) The inequality in the first two cases above is said to be strict. The case where a is greater than b or a is equal to b is denoted as a b. 4

The case where a is less than b or a is equal to b is denoted as a b. In these cases, the inequalities are said to be weak. Some simple but useful relations: If a > b and b > c then a > c. If a > b then a + c > b + c for any c. If a > b then ac > bc for any positive c. If a > b then ac < bc for any negative c. Note that multiplying through with a negative number reverses the inequality. 1.2.2 Absolute value The distance between any two real numbers a and b is given by a b = { a b if a b b a if a < b. The expression a, also called the absolute value of a, denotes the distance of a from 0. a = { a if a 0 a if a < 0. 1.2.3 Bounded sets A set S of real numbers is bounded above if there exists a real number H that is greater than or equal to every element of the set. That is, for some H we have x H for all x S. The number H, if it exists, is called the upper bound of the set S. A set of real numbers is bounded below if there exists a real number h that is less than or equal to every element of the set. That is, x h for all x S. The number h, if it exists, is called the lower bound of the set S. A set that is bounded below and bounded above is called a bounded set. 5

1.2.4 Maximum and minimum If a set S has a largest element M, we call M the maximum element of the set: M = max S. If a set S has a smallest element m, we call m the minimum element of the set: m = min S. It is easy to see that if a set has a maximum element, then it is bounded above. The converse is not true. For instance, the set {x : x < 2} has an upper bound but no maximum. 1.2.5 Intervals If a and b are two real numbers, the set of all numbers that lie in between a and b is called an interval. An open interval does not contain its boundary points. The following is an open interval. (a, b) = {x R : a < x < b} A closed interval contains its boundary points. The following is a closed interval. [a, b = {x R : a x b} We may have a half-open interval. [a, b) = {x R : a x < b} Bounded intervals The intervals listed above, [a, b, (a, b) and [a, b) were all bounded. The following intervals are unbounded. (a, ) = {x R : a < x} [a, ) = {x R : a x} (, b) = {x R : x < b} (, b = {x R : x b} 6

1.2.6 Space and distance We can generalize the idea of an interval to that of a space. The n-dimensional Euclidean space, R n is given by R n = {(x 1, x 2,..., x n ) : x i R} To work with any space we need some notion of distance. For example, the Euclidean distance between any two points in R n, call them a = (a 1, a 2,..., a n ) and b = (b 1, b 2,..., b n ), is given as a b = n i=1 (a i b i ) 2. The ideas of closed and open sets, and bounded and unbounded sets, can be extended to any space with a well-defined notion of distance. 1.3 Sequences A sequence is an ordered list of numbers, a 1, a 2, a 3,... The notation a n is often used to denote a sequence whose n-th term is a n. Formally, a sequence is a function that maps a number a n to each natural number n = 1, 2,... The following are examples of sequences: 2 n = 2, 4, 8, 16... 1 = 1, 1, 1, 1... ( 1) n = 1, 1, 1, 1... Convergent sequences. A sequence a n is said to converge to a limit L if a n is arbitrarily close to L for all n sufficiently large. Formally, for any ε > 0, however small, there is some value N such that a n L < ε for n > N. We denote this as a n L as n, or that The following sequence is convergent lim a n = L. n 1 n = 1, 1 2, 1 3,... but this one is not n = 1, 2, 3,... 7

Subsequence. If we drop some terms from a sequence (while preserving the order of remaining terms) we obtain a subsequence. So given sequence a n = a 1, a 2, a 3, a 4, a 5... we can construct subsequences, say by dropping every alternate term a 2n = a 2, a 4, a 6, a 8... or even arbitrarily a i = a 2, a 4, a 5 The summation operator. For a sequence x n, the summation operator defines the sum of specified terms of that sequence. For instance, denotes the sum of the first m terms. m x i = x 1 + x 2 + + x m i=1 The summation operator allows compact expression of various constructs in economics and finance. Suppose an investor buys n different shares: x 1 units of the first share at price p 1, and x 2 units of the second share at price p 2, and so on, till x n units of the n-th share at price p n. Investment in share i is worth p i x i, and total investment across n shares is written as n p i x i. i=1 The following properties are useful n i=1 (a i + b i ) = n i=1 n i=1 ca i = c a i + n a i i=1 n b i i=1 That is, the term c that does not vary with the index can be moved outside the summation operator without affecting the value. Double summation. More complicated operations can be expressed as double summations. m n i=1 j=1 x ij = m i=1 (x i1 + x i2 + + x in ) = (x 11 + x 12 +... + x 1n ) + (x 21 + x 22 + + x 2n ) +... + (x m1 + x m2 + + x mn ). 8

1.4 Series A series refers to the sum of terms in a sequence. For an infinite sequence a 1, a 2, a 3, a 4, a 5..., this refers to a i = a 1 + a 2 + a 3 +... Consider the partial sum i=1 s N = a 1 + a 2 + + a N = N a i. i=1 This is the sum of the first N terms of the original sequence. Note that {s 1, s 2,...} is itself a sequence of partial sums. 1.4.1 Convergent series The series i=0 a i converges to a limit s if and only if the associated sequence of partial sums {s N } converges to s. Arithmetic series. If successive terms in the original sequence differ by a constant, as in a, a + c, a + 2c,... the associated series is said to be an arithmetic series. Geometric series. If successive terms in the sequence are related by a constant multiplicative factor c, that is, the sequence is {a, ac, ac 2, ac 3,..., }, the associated series is said to be a geometric series. Harmonic series have the form 1 + 1 2 + 1 3 + 1 n + Evaluating a convergent geometric series The following describes a standard technique for evaluating a convergent geometric series s N = ac n 1 assuming that c < 1. n=1 Note we can write s N = a + ac + ac 2 + ac 3 + + ac N 1 9

Multiplying both sides by c, we get cs N = ac + ac 2 + + ac N 1 + ac N Subtracting the second equation from the first we get or that If c < 1, as N we have c N 0, so that 1.5 Functions (1 c)s N = a ac N s N = a(1 cn ) 1 c s = a 1 c. Loosely speaking, a function is a rule that associates a unique value with any element of a set. A function from a set A to set B defines a rule that assigns for each x A a unique element y B. The set of all values that it maps from is called the domain. The set of values it maps into is called the range. The mapping can be denoted as f : A B. Functions of a single variable In the simplest class of functions, both A and B are the set of real numbers. For instance, consider the rule that maps temperature measurements from the Centigrade (metric) scale to the Fahrenheit scale y = 1.8x + 32, where x is the Centigrade measurement and y the associated value in Fahrenheit. We could denote this as y = f (x). Multivariate functions Multivariate functions map from the n-dimensional space to the set of real numbers. For instance, if beer costs 2 euros a pint and chips cost 1 euro per packet, consider the rule that evaluates the total expenditure on any bundle (x b, x c ) of beers and chips. e = 2x b + x c, In general, we can have a mapping f : R n R, which assigns a real number to n-dimensional variables (x 1, x 2, x 3,..., x n ). 10

Some standard classes of functions Polynomials: These are functions of the sort f (x) = a + bx + cx 2. f (x 1, x 2, x 3 ) = ax 1 x 2 + bx 1 x 3 + cx 2 3. Linear functions: A function from A to B is said to be linear f if for all x and y in A, and r R. f (x + y) = f (x) + f (y) and f (rx) = r f (x) Quadratic functions: A quadratic function on R n is a real-valued function of the form Q(x 1, x 2,..., x n ) = n a ij x i x j i,j=1 Power Function f (x) = x a 1.5.1 Inverse of a function A function f, defined on domain A, is one-to-one if f never has the same value for two distinct points in A. A one-to-one function has an inverse: for y = f (x), we can find f 1 in A, so that x = f 1 (y). Functions such as y = x 2 do not possess an inverse since there are two values of x associated with each y. 1.5.2 Monotonicity of functions A function is weakly increasing if f (x ) f (x ) for x > x in its domain. A function is strictly increasing if f (x ) > f (x ) for x > x in its domain. A function is weakly decreasing if f (x ) f (x ) for x > x in its domain. A function is strictly decreasing if f (x ) < f (x ) for x > x in its domain. Functions that are either increasing or decreasing in their domain are said to be monotonic. Strictly monotonic functions are one-to-one, so have inverse functions. 11

1.5.3 Exponents and logarithms Consider any positive real number a. We define the exponential function as follows: Exponential function to base a: f (x) = a x. Since the exponential function is monotonic, it has an inverse. Logarithmic function, to base a. If a x = y then x = log a y. By construction, a loga(z) = z Sometimes a particular irrational number e is used as the base for the exponential function. Let ( e = lim 1 + 1 ) n n n Exponential function to natural base e: f (x) = e x This function has an inverse too: Natural logarithmic function If e x = y then x = ln y. Properties of exponents a 0 = 1 a m a n = a m+n am a n = a m n a m = 1 a m (a m ) n = a mn Properties of logarithms log 1 = 0 log(m.n) = log m + log n log( m n ) = log m log n log(1/m) = log m log m n = n log m 12

1.6 Properties of functions 1.6.1 Concavity and convexity of functions Put simply, concavity and convexity of a function refer to the curvature of its graph. For instance, the function y = ln x is concave; the function y = x 2 is convex. We can describe concavity and convexity of functions more formally, by first defining convex combinations and convex sets. Convex combination. Consider two distinct points x and x in n-dimensional space. Their convex combination is given by x = λx + (1 λ)x for λ (0, 1). For example, let x = (1, 2) and x = (3, 4): we find that (2, 3) is a convex combination of x and x using λ = 0.5. Convex sets A set X is convex if x X and x X implies (λx + (1 λ)x ) X for all λ [0, 1. That is, a set is convex if any convex combination of elements of the set lies in the set. Convex function A function of a single variable is convex if the line joining two points on the graph of the function lies above the graph. This idea can be generalized to multivariate functions. Consider a function f (x) = f (x 1, x 2,..., x n ), defined on a convex set X. For any two points in x and x in X, consider their convex combinations. The function f is said to be convex if f [λx + (1 λ)x λ f (x ) + (1 λ) f (x ) Concave function A function of a single variable is concave if the line joining two points on the graph of the function lies below the graph. Consider a function f (x) = f (x 1, x 2,..., x n ), defined on a convex set X. For any two points in x and x in X, consider their convex combinations. The function f is said to be concave if f [λx + (1 λ)x λ f (x ) + (1 λ) f (x ) Note that if a function f is convex, the negative of that function, f, is concave. 13

Homogeneous functions A function is said to be homogenous of degree λ if for y = f (x 1, x 2,..., x n ) f (kx 1, kx 2,..., kx n ) = k λ y. For instance, y = x 1 x 2 is homogeneous of degree 2, while y = x1 0.25 x2 0.75 is homogeneous of degree 1. 1.6.2 Continuity of functions Graphically, the notion of continuity is easy to understand. All polynomial functions are continuous. Definition 1 (Continuous functions) Let f be a function from R k to R. Let x 0 be a vector in R k and let y = f (x 0 ) be its image. The function f is continuous at x 0 if whenever {x n } n=1 is a sequence in R k which converges to x 0, then the sequence { f (x n )} n=1 converges to f (x 0). The function f is said to be continuous if it is continuous at each point in its domain. As an example of a function that is not continuous consider { 1, if x > 0, f (x) = 0, if x 0. Continuity of composite functions: If both g and f are continuous functions, then g( f (x)) is continuous. 1.7 Probability Consider a process whose outcome cannot be predicted. The set Ω of all possible outcomes is called the sample space. For instance, the sample space for a flip of a coin is {heads, tails}. The sample space for the roll of a die is {1,2,3,4,5,6}. An event A is a subset of Ω. For the latter example above, consider the event the roll of the die produces a number 3 or lower. The set of all events is given by the set of all subsets of Ω. For any event A, we can define its complement A c as the set of elements of Ω that are not in A. For any two events, A i and A j, their union, denoted as A i A j, defines the event that either A i or A j or both happen. 14

For any two events, A i and A j, their intersection, denoted as A i A j, defines the event that both A i and A j happen. If the intersection of two events is the empty set, that is if A i A j =, they are said to be disjoint or mutually-exclusive. Definition 2 (Probability function) A probability function P(.) assigns, to every event A, a value P(A) [0, 1, so that the following axioms hold. 1. P(A) 0 2. P(Ω) = 1 3. For any disjoint events A 1, A 2,..., A n, we have P(A 1 A 2...) = i=1 P(A i). These following results follow from the axioms: If P(A) is the probability that event A occurs, the probability of event A not happening is 1 P(A). We know, from the axiom above, that if A and B are disjoint events, we have P(A B) = P(A) + P(B). When A and B are not disjoint, P(A B) = P(A) + P(B) P(A B). 1.7.1 Independent events Two events A and B are said to be independent if the probability of both happening is the product of their individual probabilities. P(A B) = P(A) P(B). (1.1) 1.7.2 Conditional probability The probability of event A conditional on event B is defined as P(A B) = P(A B). (1.2) P(B) Note that if A and B are independent we have P(A B) = P(A). 15

1.7.3 Bayes rule Similarly, the probability of B happening given that A happens is: P(B A) = P(A B). (1.3) P(A) Multiply both sides of equation (1.2) by P(B) and both sides of (1.3) by P(A). Rearranging, we get P(A B) = P(A B)P(B) = P(B A)P(A) The joint probability is the product of the conditional probability and the marginal probability. The last equality above leads to Bayes rule: P(A B) = P(B A)P(A). P(B) Bayes rule can be re-written in different forms. Note that we can write P(B) = P(B A)P(A) + P(B A c )P(A c ), so that P(A B) = P(B A)P(A) P(B A)P(A) + P(B A c )P(A c ). 1.7.4 Example A patient goes to see a doctor. The doctor performs a test with 95 percent reliability that is, 95 percent of people who are sick test positive and 95 percent of the healthy people test negative. The doctor knows that only 1 percent of the people in the country are sick. Now the question is: if the patient tests positive, what are the chances the patient is sick? We reason as follows Let A be the event that the patient is sick, and A c be the event that the patient is healthy Let B be the event that the patient s test is positive. We must compute P(A B) = P(B A)P(A) P(B A)P(A)+P(B A c )P(A c ) 95 percent of sick people test positive: so P(B A) = 0.95 5 percent of healthy people test positive too: so P(B A c ) = 0.05 99 percent of the population is healthy, so P(A) = 0.01 and P(A c ) = 0.99 16

Using these values in the above expression, we get P(A B) = (0.95)(.01) (0.95)(.01) + (0.05)(0.99) = 16.1% An alternative way to reason through. In a population of 1 million, we would expect 10,000 sick people and 990,000 healthy people. If all the sick people were tested, 95% of them 9500 in total would test positive. Of the 990000 healthy people 5% a total of 49500 would test positive. Among the people who test positive which is 9500 + 49500 = 59000 only 9500 are actually sick, so that probability that someone who tests positive is sick is 9500/59000, which equals about 16%. 1.7.5 Random variables Random variable A random variable X is a function whose domain is the sample space and whose range is the set of real numbers. Thus, a random variable assigns a real value (i.e., a number) to every outcome in the sample space. The particular values are called realizations and are denoted as x. If the realizations are countable, x 1, x 2,..., the random variable is said to be discrete. In contrast, if there are are infinitely-many, uncountable realizations, the random variable is said to be continuous. We can specify a probability function for a discrete random variable as follows P(X = x i ) = p(x i ) 0, i p(x i ) = 1; with the associated cumulative distribution P(X x j ) = j i=1 p(x i). For a continuous random variable, the cumulative distribution function is where F(x) = P(X x) = x f (x)d(x), f (x) = df(x) dx denotes the probability density function and f (x)d(x) = 1. Although f (x) is defined at a point, P(X = x) = 0 for a continuous random variable. The support of a distribution refers to range over which f (x) = 0 (i.e., there is a positive probability of the underlying event occurring). 17

1.7.6 Expected value The expected value (or mean) of a random variable is { i=1 n E(x) = p(x i)x i, if discrete x f (x)dx, if continuous. For example, suppose we have a sample of n equally-likely observations x 1, x 2,..., x n. Using the fact that p(x i ) = 1/n, we have E(x) = n 1 n x i. i=1 Expected Value of functions of random variables Consider a random variable X with density function f (x). For any function g(x), we have E(g(x)) = Note that, in general, E(g(x)) = g(e(x)). g(x) f (x)dx. Jensen s Inequality If g(x) is a concave function, we have E(g(x)) g(e(x)). If g(x) is a linear function, we have E(g(x)) = g(e(x)). Specifically, let g(x) = a + bx. Here E(a + bx) = a + be(x). Joint distributions P(X = x 1, Y = y 1 ) = p(x 1, y 1 ) for discrete P(X x, Y y) = F(x, y) for continuous with joint pdf f (x, y) and f (x, y)dxdy = 1. Marginal distribution p(x i ) = p(x i, y j ) f (x) = j f (x, y)dy 18

Conditional distribution p(x i y j ) = p(x i, y j ) p(y j ) f (x y) = f (x, y) f (y) Notice this implies f (x, y) = f (x y) f (y) = f (y x) f (x): the joint distribution can be written as the product of the conditional and marginals. We can define expectations for marginal and conditional distributions. For instance, the conditional expectation (regression) function, gives the expected value of y as a function of x : E(y j x i ) = y j f (y j x i ) E(y x) = j y y f (y x)dy. 1.7.7 Some common probability distributions A uniform distribution defined over interval [a, b has density function f (x) = { 1 b a, if a x < b 0, otherwise. A normal distribution with mean µ and variance σ 2, written x N(µ, σ 2 ) has density function [ 1 f (x) = exp 1 2πσ 2 2 ( x µ σ )2. Linear functions of normal variates are also normal. For instance, if x is normally distributed with mean µ and variance σ 2, then ( ) x µ z = σ is normally distributed with mean 0 and variance 1. This standard normal distribution is usually denoted as N(0, 1). 19

Problems For the following questions, identify the best answer. 1. If 3x 4 = 1 then x equals (a) x = 1 (b) x = 4 3 (c) x = 1 or x = 5 3 (d) x = 1 or x = 5 3 2. For what values of x is x 2 1? (a) x lies between 0 and 1 (b) x lies between -1 and 1 (c) x lies between 1 and 3 (d) x lies between -3 and 3 3. The Euclidean distance between a = (1, 2) and b = (3, 3) is (a) (2, 1) (b) 3 (c) 5 (d) 1 4. The function f (x) = 2x 3y is (a) strictly increasing in x (b) strictly decreasing in x (c) increasing in x and y (d) increasing in y 5. The function f (x) = x 2 is (a) strictly increasing in x (b) strictly decreasing in x (c) neither increasing nor decreasing in x (d) weakly increasing in x 20

6. The function f (x) = 2x 3y is (a) concave in x (b) convex in x (c) neither concave nor convex in x (d) both concave and convex in x 7. The function f (x) = x 2 is (a) concave in x (b) convex in x (c) neither concave nor convex in x (d) both concave and convex in x 8. The function f (x) = x 3 is (a) concave in x (b) convex in x (c) neither concave nor convex in x (d) both concave and convex in x 9. Consider the geometric series s = 1 + 0.25 + 0.25 2 +.... (a) s = 2 (b) s = 4 (c) s = 4/3 (d) s = 3/4 10. The value of an asset that pays dividends D t is given by V 0 = t=1 D t (1 + r) t where r is the interest rate and D t = D(1 + g) t where g is the rate of dividends. Under what conditions does this series converge? (a) g < r (b) r < g (c) r = g (d) t > 1 21

Solve the following 11. If x and y are positive, prove that x < y if and only if x 2 < y 2. 12. Evaluate the series: x + 2x 2 + 3x 3 +... for 0 < x < 1. 13. Evaluate the series: x x 2 + x 3 x 4 +... for 0 < x < 1. 14. Consider a random variable X that can take one of values 1, 2, 3 or 4 with equal probability. Evaluate E(x), the expected value of X. Then, evaluate E(y), the expected value of Y = 2X 2. Use your findings to verify Jensen s inequality. 15. Simplify the following: (a) x 1 b x a 1 (b) [x (1 a b) [ x (a b 0.5) 2 (c) xy 6 12x 6x 2 y 2 24x 4 y 2 22

Chapter 2 Matrix algebra 2.1 Vectors A vector is an ordered set of numbers. These could be expressed as a row [ 3 2... 0, or as a column 3 2... 0. Vectors are a commonly used construct in economics and finance. For instance, a consumption vector can represent the consumption level of distinct commodities; a portfolio may be represented as a vector of asset holdings. The number of elements in a vector is referred to as its dimension. An n-dimensional vector can be represented as a row vector a = [ a 1 a 2... a n, or as a column vector a 1 a 2... a n. 2.1.1 Operations on Vectors In what follows let a, b, and c be n-dimensional vectors. 23

Equality of vectors Two n-dimensional vectors a and b are equal if and only if the corresponding components are equal: that is, if a i = b i for all i. Addition of vectors The sum of two n-dimensional vectors a = [a i and b = [b i is defined as an n-dimensional vector c whose typical component c i = a i + b i. For instance, [ 3 2 0 + [ 1 1 1 = [ 4 3 1 Properties of vector addition. Addition of vectors is commutative: a + b = b + a, and associative: (a + b) + c = a + (b + c). Scalar multiplication Multiplying a vector by a scalar involves multiplying each element by that scalar. For any vector a = [a i and real number α, we have For instance, αa = [αa i 3 [ 1 3 2 = [ 3 9 6 Difference between vectors The difference a b can be written as a + ( 1)b. For instance, [ 3 2 0 [ 1 1 1 = [ 2 1 1 Two special vectors A null vector is a vector whose elements are all zero: 0 = [ 0 0... 0 The difference between any vector and itself yields the null vector. A unit vector is a vector whose elements are all 1: i = [ 1 1... 1 24

Linear combination of vectors Given two n-vectors a and b and scalars γ and δ, the vector (γa + δb) is said to be a linear combination of a and b. Specifically, for column vectors we have a 1 γa + δb = γ a 2... + δ a n b 1 b 2... b n = γa 1 + δb 1 γa 2 + δb 2... γa n + δb n Inner product of two vectors Given two n-vectors a = a 1 a 2. a n and b = their inner product (sometimes called the dot product) is given by b 1 b 2. b n a b = a 1 b 1 + a 2 b 2 + + a n b n = Note that a b = b a, so that the operation is commutative. n a i b i. i=1 Inner product of vector and a unit vector Let i be an n-dimensional unit vector, that is, a vector whose elements are all 1. The inner product of any n-dimensional vector a with the n-dimensional unit vector equals the sum of the components of a. That is, n i a = a i. i=1 Orthogonality of vectors Two vectors are said to be orthogonal if their inner product is zero. The following are examples or orthogonal pairs: Example 1: [1 1 and [1 1, as [1 1 [1 1 = 1 1 = 0 Example 2: [1 2 1 and [3 1 5, as [1 2 1 [3 1 5 = 0 25

2.2 Matrices A matrix is a rectangular array of numbers a 11 a 12 a 1n a 21 a 22 a 2n A = [a ij =.. a m1 a m2 a mn. The notational subscripts in the typical element a ij refer to its row and column location in the array: specifically, a ij is the element in the i-th row and the j-th column. This matrix has m rows and n columns, so is said to be of dimension m n (commonly referred to as m by n ). A matrix can be viewed as a set of column vectors, or alternatively as a set of row vectors. Alternatively, a vector can be viewed as a matrix with only one row or column. Some special matrices A matrix with the same number of rows as columns is said to be a square matrix. Matrices are that are not square are said to be rectangular matrices. A null matrix is composed of all 0s and can be of any dimension. Consider the following null matrix [ 0 0 0 0 0 0 0 0 Identity matrix An identity matrix is a square matrix with 1s on the main diagonal, and all other elements equal to 0. Formally, we have a ii = 1 for all i and a ij = 0 for all i = j. Identity matrices are often denoted by the symbol I (or sometimes as I n where n denotes the dimension). The two-dimensional identity matrix is I 2 = [ 1 0 0 1 More generally, the identity matrix of dimension n is 1 0... 0 0 1... 0 I n =. 0 0... 1 26

Symmetric matrix A square matrix A = [a ij is said to be symmetric if a ij = a ji. For example 1 2 5 2 1 0 5 0 3 Diagonal matrix A diagonal matrix is a square matrix A = [a ij whose non-diagonal entries are all zero. That is, a ij = 0 for i = j. For example 1 0 0 0 3 0 0 0 2 Upper-triangular matrix An upper-triangular matrix (usually a square matrix) in which all entries below the diagonal are 0. For A = [a ij we have a ij = 0 for i > j. For example 1 4 1 0 3 0 0 0 2 Lower-triangular matrix An lower-triangular matrix (usually a square matrix) in which all entries above the diagonal are 0. For A = [a ij we have a ij = 0 for i < j. For example 1 0 0 3 3 0 1 2 2 2.2.1 Matrix operations An algebra is a system of sets and operations on these sets where the sets satisfy certain conditions and the operations satisfy some rules. 27

Equality of matrices Matrices A and B are equal if and only if they have the same dimensions and if each element of A equals the corresponding element of B. That is, if a ij = b ij for all i and j. Transpose of a matrix For any matrix A, the transpose, denoted by A (or sometimes A ), is obtained by interchanging rows and columns. If A = [ a ij, then A = [ a ji That is, the i-th row of the original matrix forms the i-th column of the transpose matrix. For example, if [ 2 3 1 A = 4 1 2 then A = 2 4 3 1 1 2 Note that if A is of dimension m n, its transpose is of dimension n m. Transpose of a symmetric matrix. If A is symmetric, A = A. Transpose of a transpose. The transpose of a transpose of a matrix yields the original matrix. We have (A ) = A. Matrix addition We can add two matrices as long as they are of the same dimension. Consider A = [a ij and B = [b ij, both of dimension m n. Their sum is defined as an m n matrix, C = A + B = [a ij + b ij. For instance, [ a11 a 12 a 21 a 22 [ b11 b + 12 b 21 b 22 [ a11 + b = 11 a 12 + b 12 a 21 + b 21 a 22 + b 22 Properties of matrix addition. Matrix addition is commutative: A + B = B + A 28

and associative (A + B) + C = A + (B + C). Addition of null matrix. For any matrix the addition of the null matrix leaves the original matrix unchanged. For any B, B + 0 = B where 0 is the null matrix with the same dimension as B. Transpose of sum. The transpose of a sum of matrices is the sum of the transpose matrices (A + B) = A + B. Scalar multiplication Multiplying the matrix by a scalar involves multiply each element by that scalar. If A = [a ij, for any real number λ, we have λa = [λa ij For instance, [ 1 3 2 3 1 0 1 = [ 3 9 6 3 0 3 Matrix multiplication Matrix multiplication is an operation on pairs of matrices that satisfy certain restrictions. The restriction is that first matrix must have the same number of columns as the number of rows in the second matrix. When this condition holds the matrices are said to be conformable under multiplication. Let A = [a ij be an m n matrix and B = [b ij be an n p matrix. As the number of columns in the first matrix and the number of rows in the second both equal n, the matrices are conformable. It is important to distinguish between pre-multiplication and post-multiplication. In the product AB, the matrix A is post-multiplied by B (or, equivalently, B is pre-multiplied by A). The product matrix C = AB is an m p matrix whose ij-th element equals the inner product of the i-th row vector of matrix A and the j-th column vector of matrix B. Formally, c ij = a ik b kj. k 29

For instance, [ 2 1 2 1 2 3 1 3 2 1 1 0 = [ 2(1) + 1(2) + 2(1) 2(3) + 1(1) + 2(0) 1(1) + 2(2) + 3(1) 1(3) + 2(1) + 3(0) = [ 6 7 8 5 Properties of matrix multiplication Matrix multiplication is not commutative. Here is why. Even when matrices A and B are conformable so that AB exists BA may not exist. For instance, if A is 3 2 and B is 2 2, AB exists but BA is not defined. even when both product matrices exist, they may not have the same dimensions. For instance, if A is 2 3 and B is 3 2, AB is of order 2 2 while BA is of order 3 3. Consider, the numerical example above, for instance. even when both product matrices are of the same dimension, they may not be equal. For instance [ [ [ 1 2 1 0 1 4 = 3 1 0 2 3 2 while [ 1 0 0 2 [ 1 2 3 1 Further AB= 0 does not imply either A= 0 or B = 0. Consider [ 1 1 2 2 = [ 1 0 1 0 [ 1 2 6 2 Also AB= AC and A = 0 does not imply B= C. Consider [ 1 1 1 1 and [ 1 1 1 1 [ 1 2 3 5 [ 2 4 2 3 However matrix multiplication is associative (AB)C = A(BC). 30

and is distributive across sums of matrices A(B+C) = AB+AC (B+C)A = BA+CA Transpose of a product. The transpose of the product (AB) = B A Multiplication with the identity matrix. Pre-multiplying or post-multiplying any matrix A with the identity matrix (of conformable dimension) yields the original matrix IA = AI = A. Multiplication with the null matrix. Multiplication of a matrix by a conformable null matrix produces a null matrix. Multiplying a matrix by itself Notation: For a square matrix A, we write A 2 = AA and A n = AA }{{... A } n times Idempotent matrix A square matrix A is said to be idempotent if AA = A. Consider [ 3 6 1 2 Elementary operations on a matrix The following operations on a matrix are described as elementary row operations. 1. interchange of two rows 2. changing a row by adding to it the multiple of another row 3. multiplying each element of a row by the same non-zero number 31

Row-echelon form. A matrix has the row-echelon form if each row has more leading zeros than the one preceding it. Consider 3 6 6 0 0 2 0 0 0 2.3 Determinants The determinant is an operation defined on square matrices. It maps the set of square matrices to the set of real numbers. 2.3.1 Determinants of order 2 The determinant of a 2 2 matrix A, usually denoted as A, is defined as A = a 11 a 12 a 21 a 22 = a 11a 22 a 12 a 21 2.3.2 Determinants of order 3 For the 3 3 matrix, the determinant is defined as A = a 11 (a 22 a 33 a 23 a 32 ) a 12 (a 21 a 33 a 23 a 31 ) + a 13 (a 21 a 32 a 22 a 31 ) 2.3.3 Higher-order determinants These operations can be represented more conveniently using the notion of minors. Minors. For any square matrix A, consider the sub-matrix A (ij) formed by deleting the i-th row and j-th column of A. The determinant of the sub-matrix A (ij) is called the (i, j)-th minor of the matrix (or sometimes the minor of element a ij ). We denote this as M ij. For instance, the minors associated with the first row of a 3 3 matrix are M 11 = a 22 a 23 a 32 a 33, M 12 = a 21 a 23 a 31 a 33, M 13 = a 21 a 22 a 31 a 32. Recalling how we specified determinants of order 2 and 3, we see that A = a 11 M 11 a 12 M 12 + a 13 M 13. Note the alternating positive and negative signs. To express this, sometimes we use co-factors rather than minors. 32

Cofactors. A cofactor associated with element a ij, denoted by C ij, is the minor with a prescribed algebraic sign ( 1) i+j. Put simply, the sign is positive for elements whose row and column indices add up to an even number, and negative otherwise. Thus C ij ( 1) i+j M ij, so that C 11 = ( 1) 2 M 11, C 12 = ( 1) 3 M 12, C 13 = ( 1) 4 M 13. In terms of co-factors, A can be written as A = a 11 C 11 + a 12 C 12 + a 13 C 13 These recursive operations can be used to define the determinant of any order, and are generally referred to as the Laplace expansion. 2.3.4 Properties of determinants 1. The transpose operation (interchanging rows with columns) does not affect the value of the determinant. Hence, A = A. 2. The interchange of two rows or two columns will change the sign of the determinant but not the numerical value. 3. The multiplication of one row or one column in A by a scalar k will change the value of the determinant to k A. 4. The addition (subtraction) of a multiple of any row to (from) another row will leave the value of the determinant unchanged. The same applies to columns. 5. If one row (column) is a multiple of another row (column), the value of the determinant is zero. 6. If A is a triangular matrix (or a diagonal matrix) then A = a 11 a 22... a nn. 7. A.B = A B 2.4 Linear independence and rank 2.4.1 Linear independence A set of vectors is linearly dependent if any of the vectors in the set can be written as a linear combination of the others. Consider the vectors [ a b and [2a 2b. Here the second row is a multiple of the first row, so that the vectors are linearly dependent. Another way to express this: a linear 33

combination of the vectors in particular, two times the first vector added to the second vector equals a null vector [0 0. This suggests a definition of linear dependence. Linear Dependence. Vectors v 1, v 2,..., v n are linearly dependent if and only if there exists scalars k 1, k 2,..., k n, not all zero, such that k 1 v 1 + k 2 v 2 +... + k n v n = 0. Alternatively, we can have an equivalent definition of linear independence. Linear Independence. Vectors v 1, v 2,..., v n are linearly independent if and only if k 1 v 1 + k 2 v 2 +... + k n v n = 0 for scalars k 1, k 2,..., k n implies k 1 = k 2 =... = k n = 0. Example. To check if the row vectors of the matrix below are linearly dependent. A = 3 4 5 0 1 2 6 8 10 = where v i denote the row vectors. Note v 3 = 2v 1. If we take k 1 = 2, k 2 = 0, and k 3 = 1, we get 2v 1 + 0 v 3 = 0 As we have found coefficients, not all zero, such that the linear combination is the null vector, the vectors are linearly dependent. v 1 v 2 v 3 Determinant and linear dependence of rows of a square matrix Consider a 2 2 matrix where the vectors are linearly dependent: a b ka kb = kab kab = 0 Linear dependence turns out to be equivalent to the determinant of the matrix being equal to zero. This holds more generally, for any square matrix. 2.4.2 Rank of a matrix Rank is defined as the order of the largest non-zero determinant that can be obtained from the elements of a matrix. This definition applies to both square and rectangular matrices. 34

Thus a non-zero matrix A has rank r if at least one of its r-square minor is different from zero while every (r + 1) or larger square minor, if any, is equal to zero. The rank of the matrix A can be found by starting with the largest determinants of order m, and evaluating them to ascertain if one of them is non-zero. If so, rank(a) = m. If all the determinants of order m are equal to zero, we start evaluating determinants of order m 1. Continuing in this fashion, we eventually find the rank r of the matrix, being the order of the largest non-zero determinants. Example 1. Find rank(a) where A = [ 6 2 3 1. Note A = 0. Then rank(a) = 1, since the largest non-zero minor of A is of order 1. (In this example there are four non-zero minors of order 1). Example 2. Find rank(a) where A = [ 6 2 3 3 1 3 Consider the minor obtained by deleting the second column. We have 6 3 3 3 = 0, so that rank(a) = 2 in this case. Clearly if A is (n m) and n = m, then rank(a) min(n, m).. 2.5 Systems of linear equations Matrices provide a compact way to represent a system of linear equations. Consider a 11 x 1 + a 12 x 2 + + a 1n x n = d 1 a 21 x 1 + a 22 x 2 + + a 2n x n = d 2 a m1 x 1 + a m2 x 2 + + a mn x n = d m This can be written more compactly as Ax = d where a 11 a 12 a 1n A = a 21 a 22 a 2n x = a m1 a m2 a mn x 1 x 2 x n d = A solution to this system refers to a set of values x 1, x 2,..., x n that satisfies all equations simultaneously. In general, the existence of a solution cannot be guaranteed: a system d 1 d 2 d m 35

of equations may have no solution. On the other hand there may be multiple solutions (including, possibly, an infinite number of solutions). The system of equations is said to be homogenous if d = 0. Otherwise it is said to be non-homogeneous. The issue of existence can be discussed using simple examples. We will confine attention to the case where the number of equations is the same as the number of variables. Example 1: Consider, first, 2x 1 + 4x 2 = 0 3x 1 + x 2 = 0 This system has a unique solution, x1 = 0; x 2 = 0. Example 2: Next, consider 2x 1 + 4x 2 = 0 x 1 + 2x 2 = 0 This system has an infinite number of solutions. Example 3: Consider, next, 2x 1 + 4x 2 = 8 3x 1 + x 2 = 7 This system has a unique solution, x1 = 2; x 2 = 1. Example 4: Consider, next, 2x 1 + 4x 2 = 8 x 1 + 2x 2 = 7 This system has no solution: the equations are inconsistent. Example 5: Last, consider 2x 1 + 4x 2 = 8 x 1 + 2x 2 = 4 This system has an infinite number of solutions. 36

2.5.1 The coefficient matrix and existence of solutions Analysis of the coefficient matrix helps us to discover general principles about the existence of solutions. Consider a system with two linear equations in two unknowns. a 11 x 1 + a 12 x 2 = d 1 a 21 x 1 + a 22 x 2 = d 2 which can be written as Ax = d where [ a11 a A = 12 a 21 a 22 x = [ x1 x 2 d = [ d1 d 2 We can write vector d as a linear combination of the columns of matrix A [ [ [ a11 a12 d1 x 1 + x a 2 = 21 a 22 d 2 If d = 0, then linear independence of the column vectors implies that the only solution is the trivial one: that x 1 = x 2 = 0. 2.5.2 Existence of a solution When does a solution exist? We distinguish between two cases. A homogeneous system, where d = 0. For this case x = 0 is an obvious (or trivial) solution: see Example 1 above. A non-trivial solution exists only if A has less than full rank: is, if A = 0, which generally leads to an infinite number of solutions, as in Example 2. A non-homogeneous system, where d = 0. This has a unique non-trivial solution only if A has full rank: that is, if A = 0: see Example 3. If A has less than full rank, we may have no solutions (Example 4) or an infinite number of solutions (Example 5). 2.6 Inverse matrix For a square matrix A, there may exist a matrix B such that AB = BA = I An inverse, if it exists is usually denoted as A 1, so that the above definition can be written as AA 1 = A 1 A = I If an inverse does not exist for a matrix, the matrix is said to be singular. 37

If an inverse exists, the matrix is said to be non-singular. Singularity, rank, and determinant. Singularity of the matrix is closely tied to the value of the determinant. We can show that the following statements are equivalent A = 0 all rows or columns in A are linearly independent, A is non-singular there exists a unique inverse A 1. Properties of inverse matrices. As long as the defined inverses exist, 1. (A 1 ) 1 = A. The inverse of an inverse recovers the original matrix. 2. (AB) 1 = B 1 A 1. The inverse of a product is the product of inverses with order switched. 3. (A ) 1 = (A 1 ). The inverse of a transpose is the transpose of the inverse. 4. If A is a diagonal matrix, then A 1 is also diagonal, with diagonal elements 1/a ii. 2.6.1 Using the inverse matrix to solve a system of equations Consider a system of n linear equations in n unknowns, which can be written as Ax = d so where A is a square matrix with dimension n n, and x and d are n 1 vectors. If an inverse exists for square matrix A then pre-multiplying the previous expression with A 1 we get A 1 Ax = A 1 d or x = A 1 d. 2.6.2 Computing the inverse matrix We now describe a method for finding the inverse matrix. We begin by setting up some further constructs. 38

The cofactor matrix For any element a ij of a square matrix A, the cofactor is given by C ij = ( 1) i+j M ij. The cofactor matrix C is obtained by replacing each element the matrix A by its corresponding cofactor, C ij. Example: Find the cofactor matrix for A = [ 3 2 4 1. The co-factors are C 11 = ( 1) 1+1 M 11 = 1 C 12 = ( 1) 1+2 M 12 = 4 C 21 = ( 1) 2+1 M 21 = 2 C 22 = ( 1) 2+2 M 22 = 3 [ C11 C C = 12 = C 21 C 22 [ 1 4 2 3 The adjoint matrix For any square matrix A, the adjoint of A is given by the transpose of the co-factor matrix. Denoting the associated co-factor matrix as C, we have adj A = C. In the previous example, the adjoint is given by [ C11 C adj A = 21 = C 12 C 22 [ 1 2 4 3 The inverse For any square matrix A, the inverse A 1 is given by which is defined as long as A = 0. A 1 = 1 adj A, A 39

Why do these steps work? (only if you really want to know!) To see why, we multiply an arbitrary 2 2 matrix A by its adjoint matrix, adj A. Let the product matrix be given by B. Thus [ a11 a 12 a 21 a 22 [ C11 C 21 C 12 C 22 [ b11 b = 12 b 21 b 22 But then Hence A adja = b 11 = a 11 C 11 + a 12 C 12 b 12 = a 11 C 21 + a 12 C 22 b 21 = a 21 C 11 + a 22 C 12 b 22 = a 21 C 21 + a 22 C 22 [ C = adj A = a 22 a 12 a 21 a 11 b 11 = a 11 a 22 a 12 a 21 = A b 12 = a 11 ( a 12 ) + a 12 a 11 = 0 b 21 = a 21 a 22 + a 22 ( a 21 ) = 0 b 22 = a 21 ( a 12 ) + a 22 a 11 = A [ A 0 0 A [ 1 0 = A 0 1 = A I. For the general n n matrix, if the elements of a row are multiplied by the co-factors of a different row and the products are summed, the result is zero. This ensures all the off-diagonal elements of the product of A and adj A are zero. Also, the elements on the principal diagonal are equal to A. Thus for an n n matrix A we have A adj A = A 0 0 0 A 0. 0 A. 0 0 A = A I. Then, as long as A = 0, This yields: A adj A A adj A A = I = A 1 40

2.6.3 Cramer s rule This method of matrix inversion enables us to describe a convenient procedure for solving a system of linear equation. Consider a system of n linear equations in n unknowns Ax = d, where A is an n n matrix, and x and d are n 1 vectors. As long as an inverse exists (that is, as long as A is non-singular This can be written as x 1 x 2.. x n = 1 A where C ij = ( 1) i+j M ij. Rewrite this as x 1 x 2.. x n Compare the i-th element x = A 1 d = adj A d. A = 1 A C 11 C 21.. C n1 C 12 C 22.. C n2.......... C 1n C 2n.. C nn d 1 d 2.. d n C 11 d 1 + C 21 d 2 +... + C n1 d n C 12 d 1 + C 22 d 2 +... + C n2 d n...... C 1n d 1 + C 2n d 2 +... + C nn d n x i = 1 A (C 1id 1 + C 2i d 2 +... + C ni d n ) of this expression, with the Laplace expansion for the evaluation of A : A = (C 1i a 1i + C 2i a 2i +... + C ni a ni ) We can see that in compared to the previous equation the elements a 1i, a 2i,..., a ni have been replaced by d 1, d 2,..., d n. So (C 1i d 1 + C 2i d 2 +... + C ni d n ) is the determinant, expanded down the i-th column, of the following matrix, which we will call D i : D i = a 11 a 12.. d 1.. a 1n a 21 a 22.. d 2.. a 2n................ a n1 a n2.. d n.. a nn 41

(note: the i-th column has been replaced by b. To summarize, we can write x i = D i A. This procedure is referred to as Cramer s rule for solving the system of equations Ax=d. Example: Find x 1, using Cramer s rule, where [ 6 3 [ x1 2 6 x 2 = [ 50 35 Now, A = 36 6 = 30 = 0. Then x 1 = D 50 3 1 35 6 = A 30 = 50(6) ( 3)35 30 = 13.5 2.7 Characteristic roots and vectors Characteristic roots (sometimes called latent roots or eigenvalues) and characteristic vectors (latent vectors or eigenvectors) are used for stability analysis of dynamic economic models and in econometrics. Definition: If A is a n n square matrix, and if a scalar λ and a (n 1) vector x = 0 satisfy Ax = λx, then λ is a characteristic root of A and x is the associated characteristic vector. If x = 0, then any λ would give Ax = λx and the problem is trivial. Hence we exclude x = 0. Note also that characteristic vectors are not unique: if x is an characteristic vector, then µ(ax) = µ(λx) for any µ = 0. Thus A(µx) = λ(µx), so that µx is also an characteristic vector. For this reason the characteristic vector is said to be only determined up to a scalar multiple. 2.7.1 Finding the characteristic roots Rewrite equation as or Ax = λx Ax λx = 0 42

[A λix = 0 which is a homogeneous system of equations. If there is a non-trivial solution to a homogeneous system, then the matrix must be singular, i.e. A λi = 0 Thus we must choose values for λ such that the determinant a 11 λ a 12.. a 1n a 21 a 22 λ.. a 2n A λi =..... = 0...... a n1 a n2.. a nn λ If [A λi is a 2 2 matrix, the value of the determinant is a polynomial of degree 2 A λi = [(a 11 λ)(a 22 λ) a 12 a 21 A λi = (a 11 a 22 a 12 a 21 ) (a 11 + a 22 )λ + λ 2 This characteristic equation sets the value of the polynomial to zero, and the characteristic roots are the solutions to this equation: that is, values of λ that, substituted into the last equation yield 0. Example: find the characteristic roots of the (2 2) matrix G = [ 2 2 2 1 The characteristic polynomial is G λi = 2 λ 2 2 1 λ = (1 + λ)(2 λ) 4 so that the characteristic equation is λ 2 λ 6 = 0 with characteristic roots λ 1 = 3 and λ 2 = 2. More generally, for an n n matrix, the determinant will be a polynomial of degree n b 0 + b 1 λ + b 2 λ 2 +... + b n 1 λ n 1 + b n λ n An n-th order polynomial has up to n different solutions. However, two or more roots may coincide, so that we get fewer than n distinct values; some roots may involve imaginary square roots of negative numbers, giving complex roots. 43