Mathematics notation and review

Similar documents
A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

Improper Integrals, and Differential Equations

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus

7.2 The Definite Integral

Quadratic Forms. Quadratic Forms

We divide the interval [a, b] into subintervals of equal length x = b a n

Chapter 3 MATRIX. In this chapter: 3.1 MATRIX NOTATION AND TERMINOLOGY

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.

The Algebra (al-jabr) of Matrices

1 Probability Density Functions

Matrix Algebra. Matrix Addition, Scalar Multiplication and Transposition. Linear Algebra I 24

The Regulated and Riemann Integrals

MATH34032: Green s Functions, Integral Equations and the Calculus of Variations 1

A Matrix Algebra Primer

Theoretical foundations of Gaussian quadrature

Review of Calculus, cont d

Algebra Of Matrices & Determinants

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

Main topics for the First Midterm

Introduction To Matrices MCV 4UI Assignment #1

ARITHMETIC OPERATIONS. The real numbers have the following properties: a b c ab ac

Math 113 Exam 1-Review

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER /2019

Bernoulli Numbers Jeff Morton

Geometric Sequences. Geometric Sequence a sequence whose consecutive terms have a common ratio.

Line and Surface Integrals: An Intuitive Understanding

different methods (left endpoint, right endpoint, midpoint, trapezoid, Simpson s).

The First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a).

Continuous Random Variables

HW3, Math 307. CSUF. Spring 2007.

Week 10: Line Integrals

4. Calculus of Variations

Review of basic calculus

INTRODUCTION TO LINEAR ALGEBRA

Recitation 3: More Applications of the Derivative

Physics 116C Solution of inhomogeneous ordinary differential equations using Green s functions

Chapter 5 : Continuous Random Variables

Math 8 Winter 2015 Applications of Integration

Chapters 4 & 5 Integrals & Applications

CS667 Lecture 6: Monte Carlo Integration 02/10/05

Chapter 5. Numerical Integration

The Wave Equation I. MA 436 Kurt Bryan

ODE: Existence and Uniqueness of a Solution

The use of a so called graphing calculator or programmable calculator is not permitted. Simple scientific calculators are allowed.

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Riemann Sums and Riemann Integrals

Math& 152 Section Integration by Parts

f(a+h) f(a) x a h 0. This is the rate at which

The Fundamental Theorem of Calculus. The Total Change Theorem and the Area Under a Curve.

Here we study square linear systems and properties of their coefficient matrices as they relate to the solution set of the linear system.

Math 1B, lecture 4: Error bounds for numerical methods

ECON 331 Lecture Notes: Ch 4 and Ch 5

Riemann Sums and Riemann Integrals

Elements of Matrix Algebra

INTRODUCTION TO INTEGRATION

Lecture 14: Quadrature

Sections 5.2: The Definite Integral

and that at t = 0 the object is at position 5. Find the position of the object at t = 2.

CHM Physical Chemistry I Chapter 1 - Supplementary Material

1 The Riemann Integral

Introduction to Group Theory

Higher Checklist (Unit 3) Higher Checklist (Unit 3) Vectors

Best Approximation in the 2-norm

MATH 144: Business Calculus Final Review

First midterm topics Second midterm topics End of quarter topics. Math 3B Review. Steve. 18 March 2009

x = b a n x 2 e x dx. cdx = c(b a), where c is any constant. a b

Polynomials and Division Theory

1.9 C 2 inner variations

New Expansion and Infinite Series

MA123, Chapter 10: Formulas for integrals: integrals, antiderivatives, and the Fundamental Theorem of Calculus (pp.

Matrices. Elementary Matrix Theory. Definition of a Matrix. Matrix Elements:

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

Riemann Integrals and the Fundamental Theorem of Calculus

Lecture 1. Functional series. Pointwise and uniform convergence.

MTH 5102 Linear Algebra Practice Exam 1 - Solutions Feb. 9, 2016

MATRIX DEFINITION A matrix is any doubly subscripted array of elements arranged in rows and columns.

Things to Memorize: A Partial List. January 27, 2017

Linearity, linear operators, and self adjoint eigenvalue problems

Chapter 1: Fundamentals

a a a a a a a a a a a a a a a a a a a a a a a a In this section, we introduce a general formula for computing determinants.

MA 124 January 18, Derivatives are. Integrals are.

Infinite Geometric Series

8 Laplace s Method and Local Limit Theorems

How can we approximate the area of a region in the plane? What is an interpretation of the area under the graph of a velocity function?

TABLE OF CONTENTS 3 CHAPTER 1

Abstract inner product spaces

Multivariate problems and matrix algebra

x = b a N. (13-1) The set of points used to subdivide the range [a, b] (see Fig. 13.1) is

THE DISCRIMINANT & ITS APPLICATIONS

Main topics for the Second Midterm

Best Approximation. Chapter The General Case

Definition of Continuity: The function f(x) is continuous at x = a if f(a) exists and lim

Matrix & Vector Basic Linear Algebra & Calculus

Chapter 1. Basic Concepts

Chapter 5. , r = r 1 r 2 (1) µ = m 1 m 2. r, r 2 = R µ m 2. R(m 1 + m 2 ) + m 2 r = r 1. m 2. r = r 1. R + µ m 1

1 Linear Least Squares

Chapter 0. What is the Lebesgue integral about?

A sequence is a list of numbers in a specific order. A series is a sum of the terms of a sequence.

The area under the graph of f and above the x-axis between a and b is denoted by. f(x) dx. π O

Transcription:

Appendix A Mthemtics nottion nd review This ppendix gives brief coverge of the mthemticl nottion nd concepts tht you ll encounter in this book. In the spce of few pges it is of course impossible to do justice to topics such s integrtion nd mtrix lgebr. Reders interested in strengthening their fundmentls in these res re encourged to consult XXX [clculus] nd Hely (000). A. Sets ({},,, ) The nottion {,b,c} should be red s the set contining the elements, b, nd c. With sets, it s sometimes convention tht lower-cse letters re used s nmes for elements, nd upper-cse letters s nmes for sets, though this is wek convention (fter ll, sets cn contin nything even other sets!). A B is red s the union of A nd B, nd its vlue is the set contining exctly those elements tht re present in A, in B, or in both. A B is red s the intersection of A nd B, nd its vlue is the set contining only those elements present in both A nd B., or equivlently {}, denotes the empty set the set contining nothing. Note tht { } isn t the empty set it s the set contining only the empty set, nd since it contins something, it isn t empty! [introduce set complementtion if necessry] A.. Countbility of sets [briefly describe] A. Summtion ( ) Mny times we ll wnt to express complex sum of systemticlly relted prts, such s + + 3 + 4 + 5 or x +x +x 3 +x 4 +x 5, more compctly. We use summtion nottion for this: 7

5 i= i = + + 3 + 4 + 5 5 x i = x +x +x 3 +x 4 +x 5 i= In these cses, i is sometimes clled n index vrible, linking the rnge of the sum ( to 5 in both of these cses) to its contents. Sums cn be nested: x ij = x +x +x +x i= j= 3 i x ij = x +x +x +x 3 +x 3 +x 33 i= j= Sums cn lso be infinite: i= i = + + 3 +... Frequently, the rnge of the sum cn be understood from context, nd will be left out; or we wnt to be vgue bout the precise rnge of the sum. For exmple, suppose tht there re n vribles, x through x n. In order to sy tht the sum of ll n vribles is equl to, we might simply write x i = i A.3 Product of sequence ( ) Just s we often wnt to express complex sum of systemticlly relted prts, we often wnt to express product of systemticlly relted prts s well. We use product nottion to do this: 5 i= i = 3 4 5 5 x i = x x x 3 x 4 x 5 i= Usge of product nottion is completely nlogous to summtion nottion s described in Section A.. A.4 Cses nottion ({) Some types of equtions, especilly those describing probbility functions, re often best expressed in the form of one or more conditionl sttements. As n exmple, consider six-sided die tht is weighted such tht when it is rolled, 50% of the time the outcome is Roger Levy Probbilistic Models in the Study of Lnguge drft, November 6, 0 8

six, with the other five outcomes ll being eqully likely (i.e. 0% ech). If we define discrete rndom vrible X representing the outcome of roll of this die, then the clerest wy of specifying the probbility mss function for X is by splitting up the rel numbers into three groups, such tht ll numbers in given group re eqully probble: () 6 hs probbility 0.5; (b),, 3, 4, nd 5 ech hve probbility 0.; (c) ll other numbers hve probbility zero. Groupings of this type re often expressed using cses nottion in n eqution, with ech of the cses expressed on different row: 0.5 x = 6 P(X = x) = 0. x {,,3,4,5} 0 otherwise A.5 Logrithms nd exponents The log in bse b of number x is expressed s log b x; when no bse is given, s in logx, the bse should be ssumed to be the mthemticl constnt e. The expression exp[x] is equivlent to the expression e x. Among other things, logrithms re useful in probbility theorybecusetheyllowonetotrnsltebetweensumsndproducts: i logx i = log i x i. Derivtives of logrithmic nd exponentil functions re s follows: A.6 Integrtion ( ) d dx log bx = xlogb d dx yx = y x logy Sums re lwys over countble (finite or countbly infinite) sets. The nlogue over continuum is integrtion. Correspondingly, you need to know bit bout integrtion in order to understnd continuous rndom vribles. In prticulr, bsic grsp of integrtion is essentil to understnding how Byesin sttisticl inference works. One simple view of integrtion is s computing re under the curve. In the cse of integrting function f over some rnge [,b] of one-dimensionl vrible x in which f(x) > 0, this view is literlly correct. Imgine plotting the curve f(x) ginst x, extending stright lines from points nd b on the x-xis up to the curve, nd then lying the plot down on tble. The re on the tble enclosed on four sides by the curve, the x-xis, nd the two dditionl stright lines is the integrl f(x)dx Roger Levy Probbilistic Models in the Study of Lnguge drft, November 6, 0 9

f(x, y) f(x) 0 f(x) 0.5 0 0.75 4 3 0.0 0.8 0.6 y 0.4 0. 0.0 0.0 0. 0.4.0 0.8 0.6 x b.5 3 x () x (b) Figure A.: Integrtion (c) This is depicted grphiclly in Figure A.. The sitution is perhps slightly less intuitive, but relly no more complicted, when f(x) crosses the x-xis. In this cse, re under the x-xis counts s negtive re. An exmple is given in Figure A.b; the function here is f(x) = (.5 x). Since the re of tringle with height h nd length l is lh, we cn compute the integrl in this cse by subtrcting the re of the smller tringle from the lrger tringle: 3 f(x)dx =.5 0.75 0.5 0.5 = 0.5 Integrtion lso generlizes to multiple dimensions. For instnce, the integrl of function f over n re in two dimensions x nd y, where f(x,y) > 0, cn be thought of s the volume enclosed by projecting the re s boundry from the x,y plne up to the f(x,y) surfce. A specific exmple is depicted in Figure A.c, where the re in this cse is the squre bounded by /4 nd 3/4 in both the x nd y directions. 3 4 4 3 4 4 f(x,y)dxdy An integrl cn lso be over the entire rnge of vrible or set of vribles. For instnce, one would write n integrl over the entire rnge of x s f(x)dx. Finlly, in this book nd in the literture on probbilistic inference you will see the bbrevited nottion f(θ)dθ, where θ is typiclly n ensemble (collection) of vribles. In this book, the proper θ interprettion of this nottion is s the integrl over the entire rnge of ll vribles in the ensemble θ. Roger Levy Probbilistic Models in the Study of Lnguge drft, November 6, 0 30

A.6. Anlytic integrtion tricks Computing n integrl nlyticlly mens finding n exct form for the vlue of the integrl. There re entire books devoted to nlytic integrtion, but for the contents of this book you ll get pretty fr with just few tricks.. Multipliction by constnts. The integrl of function times constnt C is the product of the constnt nd the integrl of the function: Cf(x)dx = C f(x)dx. Sum rule. The integrl of sum is the sum of the integrls of the prts: [f(x)+g(x)]dx = f(x)dx+ g(x) dx 3. Expressing one integrl s the difference between two other integrls: For c <,b, f(x)dx = c f(x)dx c f(x)dx This is n extremely importnt technique when sking whether the outcome of continuous rndom vrible flls within rnge [,b], becuse it llows you to nswer this question in terms of cumultive distribution functions (Section.6); in these cses you ll choose c =. 4. Polynomils. For ny n : And the specil cse for n = is: x n dx = n+ (bn+ n+ ) x dx = logb log Note tht this generliztion holds for n = 0, so tht integrtion of constnt is esy: Cdx = C(b ) Roger Levy Probbilistic Models in the Study of Lnguge drft, November 6, 0 3

5. Normlizing constnts. If the function inside n integrl looks the sme s the probbility density function for known probbility distribution, then its vlue is relted to normlizing constnt of the probbility distribution. [Exmples: norml distribution; bet distribution; others?] For exmple, consider the integrl exp ] [ x dx 8 This my look hopelessly complicted, but by comprison with Eqution. in Section.0 you will see tht it looks just like the probbility density function of normlly distributed rndom vrible with men µ = 0 nd vrince σ = 9, except tht it doesn t hve the normlizing constnt πσ. In order to determine the vlue of this integrl, we cn strt by noting tht ny probbility density function integrtes to : Substituting in µ = 0,σ = 9 we get [ ] exp (x µ) dx = πσ σ 8π exp By the rule of multipliction by constnts we get 8π exp ] [ x dx = 8 ] [ x dx = 8 or equivlently exp ] [ x dx = 8π 8 giving us the solution to the originl problem. A.6. Numeric integrtion The lterntive to nlytic integrtion is numeric integrtion, which mens pproximting the vlue of n integrl by explicit numeric computtion. There re mny wys to do this one common wy is by breking up the rnge of integrtion into mny smll pieces, pproximting the size of ech piece, nd summing the pproximte sizes. A grphicl Roger Levy Probbilistic Models in the Study of Lnguge drft, November 6, 0 3

p(x) b x Figure A.: Numeric integrtion exmple of how this might be done is shown in Figure A., where ech piece of the re under the curve is pproximted s rectngle whose height is the verge of the distnces from the x-xis to the curve t the left nd right edges of the rectngle. There re mny techniques for numeric integrtion, nd we shll hve occsionl use for some of them in this book. A.7 Precedence ( ) The opertor is used occsionlly in this book to denote liner precedence. In the syntx of English, for exmple, the informtion tht verb phrse (VP) cn consist of verb (V) followed by noun phrse (NP) object is most often written s: Roger Levy Probbilistic Models in the Study of Lnguge drft, November 6, 0 33

V V NP This sttement combines two pieces of informtion: () VP cn be comprised of V nd n NP; nd () in the VP, the V should precede the NP. In syntctic trdition stemming from Generlized Phrse Structure Grmmr (Gzdr et l., 985), these pieces of informtion cn be seprted: ()V V, NP ()V NP where V,NP mens the unordered set of ctegories V nd NP, nd V NP reds s V precedes NP. A.8 Combintorics ( ( n r) ) The nottion ( n r) is red s n choose r nd is defined s the number of possible wys of selecting r elements from lrger collection of n elements, llowing ech element to be chosen mximum of once nd ignoring order of selection. The following equlity holds generlly: ( ) n = r n! r!(n r)! (A.) The solution to the closely relted problem of creting m clsses from n elements by selecting r i for the i-th clss nd discrding the leftover elements is written s ( n r...r m ) nd its vlue is ( ) n = r...r m n! r!...r m! (A.) Terms of this form pper in this book in the binomil nd multinomil probbility mss functions, nd s normlizing constnt for the bet nd Dirichlet distributions. A.9 Bsic mtrix lgebr There re number of situtions in probbilistic modeling mny of which re covered in this book where the computtions needing to be performed cn be simplified, both conceptully nd nottionlly, by csting them in terms of mtrix opertions. A mtrix X of dimensions m n is set of mn entries rrnged rectngulrly into m rows nd n columns, with its entries indexed s x ij : Roger Levy Probbilistic Models in the Study of Lnguge drft, November 6, 0 34

x x... x n x x... x n X =...... x m x m... x mn [ ] 3 4 For exmple, the mtrix A = hs vlues 0 = 3, = 4, 3 =, = 0, =, nd 3 =. For mtrix X, the entry x ij is often clled the i,j-th entry of X. If mtrix hs the sme number of rows nd columns, it is often clled squre mtrix. Squre mtrices re often divided into the digonl entries {x ii } nd the off-digonl entries {x ij } where i j. A mtrix of dimension m tht is, single-column mtrix is often clled vector. Symmetric mtrices: squre mtrix A is symmetric if A T = A. For exmple, the mtrix 0 4 3 4 5 is symmetric. You will generlly encounter symmetric mtrices in this book s vrincecovrince mtrices (e.g., of the multivrite norml distribution, Section 3.5). Note tht symmetric n n mtrix hs n(n+) free entries one cn choose the entries on nd bove the digonl, but the entries below the digonl re fully determined by the entries bove it. Digonl nd Identity mtrices: For squre mtrix X, the entries x ii tht is, when the column nd row numbers re the sme re clled the digonl entries. A squre mtrix whose non-digonl entries re ll zero is clled digonl mtrix. A digonl mtrix of size n n whose digonl entries re ll is clled the size-n identity mtrix. Hence A below is digonl mtrix, nd B below is the size-3 identity mtrix. 3 0 0 0 0 A = 0 0 B = 0 0 0 0 0 0 The n n identity mtrix is sometimes notted s I n ; when the dimension is cler from context, sometimes the simpler nottion I is used. Trnsposition: For ny mtrix X of dimension m n, the trnspose of X, or X T, is n n m-dimensionl mtrix such tht the i,j-th entry of X T is the j,i-th entry of X. For the mtrix A bove, for exmple, we hve 3 0 A T = 4 (A.3) Roger Levy Probbilistic Models in the Study of Lnguge drft, November 6, 0 35

Addition: Mtricesoflikedimensioncnbedded. IfX ndy rebothm nmtrices, then X +Y is the m n mtrix whose i,j-th entry is x ij +y ij. For exmple, 3 0 4 + 0 = 4 0 5 5 4 7 (A.4) Multipliction: If X is n l m mtrix nd Y is n m n mtrix, then X nd Y cn be multiplied together; the resulting mtrix XY is n l m mtrix. If Z = XY, the i,j-th entry of Z is: z ij = For exmple, if A = 0 nd B = 3 m x ik y kj k= [ ] 3 4 6, we hve 0 5 3+ 0 4+ ( 5) ( )+ 6+ ( ) AB = ( ) 3+0 0 ( ) 4+0 ( 5) ( ) ( )+0 ( ) 6+0 ( ) 3 3+ 0 3 4+ ( 5) 3 ( )+ 3 6+ ( ) 3 6 3 = 3 4 6 9 7 6 Unlike multipliction of sclrs, mtrix multipliction is not commuttive tht is, it is not generlly the cse tht XY = Y X. In fct, being ble to form the mtrix product XY does not even gurntee tht we cn do the multipliction in the opposite order nd form the mtrix product Y X; the dimensions my not be right. (Such is the cse for mtrices A nd B in our exmple.) Determinnts. For squre mtrix X, the determinnt X is mesure of the mtrix s size. In this book, determinnts pper in coverge of the multivrite norml distribution (Section 3.5); the normlizing constnt of the multivrite norml density includes the determinnt of the covrince mtrix. (The univrite norml density, introduced in Section.0, is specil cse; there, it is simply the vrince of the distribution tht ppers in the normlizing constnt.) For smll mtrices, there re simple techniques [ for ] b clculting determinnts: s n exmple, the determinnt of mtrix A = c d is A = d bc. For lrger mtrices, computing determinnts requires more generl nd complex techniques, which cn be found in books on liner lgebr such s Hely (000). Mtrix Inversion. The inverse or reciprocl of n n n squre mtrix X, denoted X, is the n n mtrix such tht XX = I n. As with sclrs, the inverse of the inverse Roger Levy Probbilistic Models in the Study of Lnguge drft, November 6, 0 36

of mtrix X is simply X. However, not ll mtrices hve inverses (just like the sclr 0 hs no inverse). For exmple, the following pir of mtrices re inverses of ech other: A = [ ] [ ] A = 3 3 3 3 A.9. Algebric properties of mtrix opertions Associtivity, Commuttivity, nd Distributivity Consider mtrices A, B, nd C. Mtrix multipliction is ssocitive (A(BC) = (AB)C) nd distributive over ddition (A(B +C) = (A+B)C), but not commuttive: even if the multipliction is possible in both orderings (tht is, if B nd A re both squre mtrices with the sme dimensions), in generl AB BA. Trnsposition, inversion nd determinnts of mtrix products. The trnspose of mtrix product is the product of ech mtrix s trnspose, in reverse order: (AB) T = B T A T. Likewise, the inverse of mtrix product is the product of ech mtrix s inverse, in reverse order: (AB) = B A. The determinnt of mtrix product is the product of the determinnts: AB = A B Becuse of this, the determinnt of the inverse of mtrix is the reciprocl of the mtrix s determinnt: A = A A.0 Miscellneous nottion : You ll often see f(x) g(x) for some functions f nd g of x. This is to be red s f(x) is proportionl to g(x), or f(x) is equl to g(x) to within some constnt. Typiclly it s used when f(x) is intended to be probbility, nd g(x) is function tht obeys the first two xioms of probbility theory, but is improper. This sitution obtins quite often when, for exmple, conducting Byesin inference. Roger Levy Probbilistic Models in the Study of Lnguge drft, November 6, 0 37

Roger Levy Probbilistic Models in the Study of Lnguge drft, November 6, 0 38