Notes on Mathematics Groups

Similar documents
Review of Basic Probability Theory

Stochastic Processes

Basic Probability. Introduction

Recitation 2: Probability

Dept. of Linguistics, Indiana University Fall 2015

Probability Year 9. Terminology

Single Maths B: Introduction to Probability

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Review of Probability. CS1538: Introduction to Simulations

Joint Probability Distributions and Random Samples (Devore Chapter Five)

* 8 Groups, with Appendix containing Rings and Fields.

Discrete Probability. Chemistry & Physics. Medicine

Homework 4 Solution, due July 23

Brief Review of Probability

Recap. The study of randomness and uncertainty Chances, odds, likelihood, expected, probably, on average,... PROBABILITY INFERENTIAL STATISTICS

ES 111 Mathematical Methods in the Earth Sciences Lecture Outline 3 - Thurs 5th Oct 2017 Vectors and 3D geometry

Review of Probability Theory

Lecture 2: Repetition of probability theory and statistics

CME 106: Review Probability theory

Steve Smith Tuition: Maths Notes

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Probability Year 10. Terminology

1 INFO Sep 05

Probability Theory Review

Statistical Theory 1

If the objects are replaced there are n choices each time yielding n r ways. n C r and in the textbook by g(n, r).

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Discrete Probability Refresher

Sociology 6Z03 Topic 10: Probability (Part I)

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Probability theory basics

Vectors a vector is a quantity that has both a magnitude (size) and a direction

Appendix A : Introduction to Probability and stochastic processes

In this initial chapter, you will be introduced to, or more than likely be reminded of, a

P [(E and F )] P [F ]

Topic 3 Random variables, expectation, and variance, II

Probability COMP 245 STATISTICS. Dr N A Heard. 1 Sample Spaces and Events Sample Spaces Events Combinations of Events...

Econ 325: Introduction to Empirical Economics

(arrows denote positive direction)

Page 52. Lecture 3: Inner Product Spaces Dual Spaces, Dirac Notation, and Adjoints Date Revised: 2008/10/03 Date Given: 2008/10/03

Fundamentals of Probability CE 311S

Axioms of Probability

Lecture 1. ABC of Probability

STA Module 4 Probability Concepts. Rev.F08 1

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Probability and random variables

1 Dirac Notation for Vector Spaces

Lecture 2: Review of Probability

Stochastic Histories. Chapter Introduction

Lecture 4: Probability and Discrete Random Variables

Chapter 2. Linear Algebra. rather simple and learning them will eventually allow us to explain the strange results of

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Random variables (discrete)

P (A) = P (B) = P (C) = P (D) =

Intermediate Math Circles November 8, 2017 Probability II

Probability Theory and Applications

Math-Stat-491-Fall2014-Notes-I

Multivariate probability distributions and linear regression

Probability Dr. Manjula Gunarathna 1

1 Presessional Probability

Notes 1 Autumn Sample space, events. S is the number of elements in the set S.)

Executive Assessment. Executive Assessment Math Review. Section 1.0, Arithmetic, includes the following topics:

Introduction to Probability. Ariel Yadin. Lecture 1. We begin with an example [this is known as Bertrand s paradox]. *** Nov.

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Bivariate distributions

n N CHAPTER 1 Atoms Thermodynamics Molecules Statistical Thermodynamics (S.T.)

Review of probability. Nuno Vasconcelos UCSD

Lecture 16 : Independence, Covariance and Correlation of Discrete Random Variables

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Stochastic Quantum Dynamics I. Born Rule

the time it takes until a radioactive substance undergoes a decay

Vectors Part 1: Two Dimensions

Statistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions

Probability & Random Variables

(But, they are entirely separate branches of mathematics.)

EXPECTED VALUE of a RV. corresponds to the average value one would get for the RV when repeating the experiment, =0.

DO NOT OPEN THIS TEST BOOKLET UNTIL YOU ARE ASKED TO DO SO

Chapter 2. Matrix Arithmetic. Chapter 2

RVs and their probability distributions

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Section 13.3 Probability

Topic -2. Probability. Larson & Farber, Elementary Statistics: Picturing the World, 3e 1

Physics 342 Lecture 2. Linear Algebra I. Lecture 2. Physics 342 Quantum Mechanics I

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

GEOMETRY OF MATRICES x 1

Vector Spaces in Quantum Mechanics

Chapter 8: An Introduction to Probability and Statistics

Probabilistic Systems Analysis Spring 2018 Lecture 6. Random Variables: Probability Mass Function and Expectation

CS 246 Review of Proof Techniques and Probability 01/14/19

M378K In-Class Assignment #1

18.440: Lecture 26 Conditional expectation

Introduction to Probability and Stocastic Processes - Part I

Deep Learning for Computer Vision

Algorithms for Uncertainty Quantification

VECTORS. 3-1 What is Physics? 3-2 Vectors and Scalars CHAPTER

1 Proof techniques. CS 224W Linear Algebra, Probability, and Proof Techniques

Quantum Mechanics - I Prof. Dr. S. Lakshmi Bala Department of Physics Indian Institute of Technology, Madras. Lecture - 7 The Uncertainty Principle

Transcription:

EPGY Singapore Quantum Mechanics: 2007 Notes on Mathematics Groups A group, G, is defined is a set of elements G and a binary operation on G; one of the elements of G has particularly special properties and is known as the identity, I, element of the group; for all elements a, b, c of G we have: (a b) c = a (b c) [ is associative] a I = a = I a. itself]. Lastly, for any element a of G, there is a unique element a of G such that [The identity maps an element into a a = I = a a [Each element has an inverse]. The group is closed meaning that any operation between any two elements returns an element of the group, i.e. no operation will return an element not in the group. This is the general definition of a mathematical group. If the binary operation is commutative, i.e. that a b = b a, then the group is said to be Abelian (or simply commutative). The elements themselves have not been specified either but as long as the above properties hold, it forms a mathematical group. Also, the number of elements maybe finite or infinite; the latter are called continuous groups. [Example 1]: A simple example would be the four elements {1, i, 1, i} under the binary operation of multiplication ( = ). The identity element is clearly I = 1. Verifying the properties above with some examples, (a b) c = a (b c) (1 i) i = 1 = 1 (i i) a I i 1 = i. The inverses are [1 = 1, i = i, 1 = 1, i = i] which can be easily verified. [Example 2]: Consider the set of elements consisting of rotations in the two dimensional plane, R 2. Limiting to clockwise rotations by 0 o, 90 o, 180 o, 270 o these are represented as the operators {R 0,R 90,R 180,R 270 }. Note that these operators rotate the plane of and change the arrangement of vectors, however the group consists of the rotation operators and not the vectors themselves. It should be clear that the identity element I is R 0 since not rotating and then rotating by some angle is the same is just rotating by the second angle. The binary relation is multiplication of operators. The first property of groups is one you may need to convince yourself of. Consider the following sets of rotations R 90,R 270,R 180, (a b) c = a (b c) (R 90 R 270 )R 180 = R 90 (R 270 R 180 ) The first set says to rotate by 180 o then by the combined rotation of 90 o + 270 o = 360 o, which clearly equals a rotation by 180 o. The other side of the equality says to rotate through the combination of 270 o + 180 o = 450 o = 90 o + 360 o and then by 90 o, which also equals a rotation by 180 o. The inverses of each operator are: (R 0 ) 1 = R 0 ; (R 90 ) 1 = R 270 ; (R 180 ) 1 = R 180 ; (R 270 ) 1 = R 90. The other properties clearly show that these elements form a group. [Example 3]: Consider a group which consists of rotations by 90 o in three dimensional space. To simplify the discussion we ll restrict our attention to the rotation of a book. The operators involved will consist of rotations about axes centered on the three planes of the book. We will call these axes A, B, and C that remain fixed to the book. The elements are: A +,A,B +,B,C +,C, and R 0, where + and - indicate rotations clockwise and counterclockwise by 90 o. You should verify that this set of elements do indeed form a group. Notice however that this group is non-abelian. This can be checked by the following sets of rotations, A + B + and B + A + Take a book, rotate it about one face and then a separate one. Compare the orientation of the book to when the two rotations are reversed. You should find that the book is not in the same orientation. This indicates that these two operations are not the same and these group of rotations are non-abelian. The more general result is that the group of rotations in three dimensional space (technical term for this group is SO(3)) is non-abelian. 1

Isomorphism Let s return to examples 1 and 2 and note that these two groups are identical in every way if we make the identification 1 = R 0, i = R 90, 1 = R 180, and i = R 270. The multiplication of elements is identical, thus these two groups are the same. The mathematical term for this equality is isomorphism. If you can exactly map one group onto another, with no missing elements or operations, then the two groups are isomorphic though the elements may seem very different in nature. Fields When dealing with mathematical objects the basic entities can come in different types. We will assume you are familiar with real numbers and complex numbers. These are two examples of fields, there are others. The mathematical structure of a field is different than a group and includes the following operations and designated elements, F, +,, 1, 0, 1. The element a of F undergo the operations of addition and multiplication, along with an additive inverse, a as well as a multiplicative inverse a 1 = 1 a. In every field there are two elements 0 (additive identity or null) and 1 (multiplicative identity). Again the two primary fields that we will be concerned with are the real, R and complex, C, numbers. There are other fields that will not enter our realm but others have explored. These are the quaternions, Q, (explored by Hamilton) and the octonions, O, among others. Vectors and Vector Spaces Vectors are mathematical objects that have particular properties. First, we will stick to the physical interpretation of vectors for simplicity. Later we will define vectors and vector spaces from a more formal mathematical point. Given a coordinate system in three dimensional space the displacement of a location in space from the origin can be represented by a vector. The vector can be viewed as an arrow in space which has a direction and length (magnitude). To quantitatively represent the vector, 3 numbers representing the displacement in each perpendicular direction can be used. Thus, for an object at position x = 1 m, y = 3 m, z = 4 m, the vector can be expressed as a triplet of numbers, = (1 m, 3 m, 4 m). As you should be familiar with already, a vector is expressed by placing an arrow over the variable representing it (often boldface letters will be used, i.e. r). The displacement between two points in three dimensional space is simply the difference in each of the three directions. One way to represent a vector is by a listing the three components in a row or column as such, (A x, A y, A z ). (Whether it is a row or column wont concern is too much yet). Another way to represent a vector is by using unit vectors. A unit vector is simply a vector of unit length pointing along one of the defined axes. In three dimensional space there is a convention as to how they are labeled. The unit vector pointing along the x direction is labeled as î, along y as ĵ and along z as ˆk. The hat is used to distinguish it from regular vectors. With these unit vectors it is simple to represent any old vector. Multiply the x component of the vector by î, thus it simply scales the component to the appropriate length. Thus we can represent a vector as follows, A = A x î + A y ĵ + A zˆk This form will make understanding dot and cross products easier. It is important to realize that vectors depend upon the coordinate system in use. If you change coordinate systems, the three components of the vector will change. However, the magnitude and relative direction to other vectors remains constant. Hence, one vector can take on many different values for its entries in different coordinate systems. To refresh your memory, to add two vectors together you simply add each of the components separately. Likewise, to subtract two vectors you subtract each separate component. In this appendix and the next, we will show two different ways to multiply vectors together. Note, there is no defined way of dividing vectors. The dot product is an operation that maps two vectors into a scalar (a number). The cross product is an operation that maps two vectors into another vector. Vector Spaces As was mentioned above representing a vector as an arrow has limited quantitative benefit. When we come to discuss more sophisticated spaces, it will not be easy to picture as arrows (e.g. consider a vector in a 4-dimensional space). The vector as viewed as an arrow is a more general, basis independent, representation. In order to get quantitative information, a basis must be introduced. We will now go through a more formal definition of vectors and vector spaces. Some definitions and notation, Fields: Real numbers R or R, 2

Complex numbers C or C, (Quaternions Q or Q, we will not consider these). Scalars, k, are 1 1 arrays (simply numbers which can be real or complex). Vectors, ket vector v and a bra vector v, are N component lists of numbers (real or complex), we will explore the difference between kets and bras later (kets as 1 N or bras as N 1 arrays). Matrices, M, Tensors, are arrays of real or complex numbers, labeled as N M (N rows and M columns). the previous three are types of tensors which can be considered as many dimensional arrays N M K L... Tensors are important in general relativity. They will not be considered in the quantum course in anything but a superficial manner. Now we introduce a linear vector space, labeled as V n (F), as a set of vectors which may be added, subtracted, multiplied (to be defined), or multiplied by scalars in such a way that, a) these operations yield another element of V n (F) (formally, they form a group under addition). b) Are n dimensional and whose elements are from the field F. c) Satisfy the following properties, (introduce arbitrary vectors from V n (F): v, w, u ), Addition: Commutativity, Associativity, Identity, v + w = w + v v + ( w + u) = ( v + w) + u there exists a vector 0, such that, v + 0 = v. Inverse, for every v, there exists a unique inverse, v, such that v + ( v) = 0. Multiplication by scalars (given a and b which are scalars): Distributive, a( v + w) = a v + a w (a + b) v = a v + b v. a(b v) = (ab) v. 0 v = 0. Also, a 0 = 0 ( 1) v = v. Multiplication by vectors There are three types of products of vectors, only one of which will be of concern to us. They are, Inner product, v w. Returns a scalar. (This you may be familiar with by its three dimensional name, dot product. The term inner product is the general term applicable to any type of space. Commutative: v w = w v a( v w) = (a v) w = v (a w) Cross product, v w. Returns a vector. Outer product, v w, Returns a tensor. 3

Details of vector operations Adding and subtracting vectors: Graphical method for V 3 (R), place the tail of one vector to the head of the other, sum is from tail of first to head of last. The better, more general method, is to simply add the like components. Consider two vectors from a 4-dimensional real (or complex) linear vector space, v = a 1ˆx 1 + a 2ˆx 2 + a 3ˆx 3 + a 4ˆx 4, w = b 1ˆx 1 + b 2ˆx 2 + b 3ˆx 3 + b 4ˆx 4 v + w = (a 1 + b 1 )ˆx 1 + (a 2 + b 2 )ˆx 2 + (a 3 + b 3 )ˆx 3 + (a 4 + b 4 )ˆx 4 To subtract two vectors simply change the subtracting vector to its additive inverse and add the vectors, v w = v + ( w) v w = a 1 b 1 )ˆx 1 + (a 2 b 2 )ˆx 2 + (a 3 b 3 )ˆx 3 + (a 4 b 4 )ˆx 4 Multiplication of vectors by scalars Simply multiply each component by a scalar, k v = (ka 1 )ˆx 1 + (ka 2 )ˆx 2 + (ka 3 )ˆx 3 Multiplication of vectors Inner product over reals Using the vectors v and w defined above, where the coefficients a i and b i are real the inner product is defined as, v w = (a 1 b 1 )(ˆx 1 ˆx 1 ) + (a 2 b 2 )(ˆx 2 ˆx 2 ) + (a 3 b 3 )(ˆx 3 ˆx 3 ) = a 1 b 1 + a 2 b 2 + a 3 b 3 The inner product is defined by the inner product of the unit vectors, ˆx i ˆx i = 1 ˆx i ˆx j = 0 if i j One of the important properties of the inner product is its relation to the measurement of distance in a space. In fact, the inner product defines how distances are measured. In ordinary three dimensional space the inner product (dot product in this case) of a vector with itself, is the square of the length of the vector. v v = a 2 1 + a2 2 + a2 3 = v 2 The Pythagorean theorem is seen as just an example of the inner product (in two dimensions). This property of the inner product remains no matter the dimension or field of the vector space. The inner product of a vector with itself is always a real, positive number (scalar). Inner product over complex field Of prime importance for quantum mechanics is the inner product within complex vector spaces. To make the process easier P. A. M. Dirac introduced a new notation. But before we introduce that, let s see how the inner product is defined for vectors with complex coefficients (the unit vectors always being real). Once again, consider our two vectors v and w with, now complex, coefficients a i and b i. The inner product is defined as, v w = (a 1b 1 ) + (a 2b 2 ) + (a 3b 3 ) (1) We see that it is not symmetric (i.e. v w = ( w v) ). The reason for this structure is that the inner product of a complex vector with itself must be a real positive number (the length squared). Since a complex number times itself is complex itself we see that it can not define the length. The modulus square of a complex number, z z z 2 is a real positive number. Now it is time to introduce the notation of Dirac, the bra-ket vector formalism. Since the two vectors in the inner product are treated differently we denote them in a different manner. Ordinary vectors (the one not complex conjugated 4

in the inner product) are called ket vectors and are denoted as v. For every ket vector, there is a unique counterpart called a bra vector, denoted as v. The relation between a ket vector and bra vector is simple. Ket vector v = a 1 1 + a 2 2 + a 3 3 Bra vector v = a 1 1 + a 2 2 + a 3 3 In the expression above the 1, 2, 3 vectors are unit vectors (real and of length 1), only the coefficients are complex. [Technically, the ket vectors belong to a vector space V n (C) and the bra vectors belong to a dual space, Ṽ n (C). The two spaces are one to one and onto.] The convenience of this notation is now the inner product of two vectors is written in a particular order, inner product v w = a 1 b 1 1 1 + a 2 b 2 2 2 + a 3 b 3 3 3 = a 1 b 1 + a 2 b 2 + a 3 b 3 And again, we see that v w = ( w v ) as well as v v = a 1 a 1 + a 2 a 2 + a 3 a 3, the square of the length of the vector. Cross product We will not be concerned with cross products but they can be easily understood by the cross product of the unit vectors in three dimensions. (Again we will not be concerned with cross products for dimensions other than three which is not always defined) Probability ˆx i ˆx i = 0 ˆx 1 ˆx 2 = ˆx 3 ˆx 1 ˆx 3 = ˆx 2 ˆx 2 ˆx 3 = ˆx 1 ˆx i ˆx j = ˆx j ˆx i In this section we will give a brief review of probability theory. This is not meant to supplant a proper treatment of the subject but to provide a background in what will be discussed later. The ideas will be primarily developed via examples that you are, no doubt, familiar with. Finite Outcome Probability The first fact about probabilities will hopefully be obvious and will be stated without proof. The probability for any particular event to occur is always less than or equal to 100% (which is equivalent to saying that the probability is less than or equal to 1). It does not make sense to say that something has a 110% chance of occurring, 100% is the highest saying that something will definitely occur. Also, probabilities are always greater than or equal to zero. Again, does it make sense for the probability for an event to occur to be 10%? Of course not. A probability of zero says that the event will never occur, you cant get any lower than that. Now let s go on to see how to find probabilities for some simple scenarios. The simplest example using probability is the tossing of a coin. The outcome (heads (H) or tails (T)) is taken as completely random with either occurrence being equally likely. We say there is a 50% chance of heads arising and 50% chance of tails arising. Put another way, the probability is 1 2 that the result is H and 1 2 that it is T. The first important point to note is that the sum of the probabilities for each outcome to happen must add up to 1. Here, we have 1 2 + 1 2 = 1, assuming that the only two outcomes are H or T. Consider one die, if you toss it only one of six numbers will come up. Since we assume that each number is equally likely to come up the probability for a particular number to show is 1/6. (Of course the probability for H and T to come up is not really 50/50. Due to the printing on each side of a US penny the probabilities differ slightly (Lincoln s head is more massive than the monument on the back)). The rule for assigning probabilities for a finite number of events occurring with equal probability is simply P(a) = 1/N, where N is the total number of possible outcomes. The total probability for any outcome to occur is 1, since P = 1/N +1/N + +1/N = N/N = 1. If the number of possibilities for one outcome to occur is greater than one then we divide the number of possible ways for the outcome to occur by the total number of possible outcomes. For example, consider tossing a pair of dice. The probability for the sum to be 3 is not the same as for it to be 7. Tabulating the possible outcomes we can find the probability for any sum to appear. From this table we see that 7 is the sum which is most likely to occur. The total number of possible outcomes is 36 (by summing the number of outcomes). Thus the rule to find the probability for an event to occur when there is a finite number of possible outcomes is, P i = number of states with outcome i total number of possible outcomes 5 (2)

Sum Possible outcomes (die 1-die 2) #of outcomes Probability 1 None 0 0 2 1-1 1 1/36 3 1-2, 2-1 2 1/18 4 1-3,3-1,2-2 3 1/12 5 1-4,4-1,2-3,3-2 4 1/9 6 1-5,5-1,2-4,4-2,3,3 5 5/36 7 1-6,6-1,2-5,5-2,3-4,4-3 6 1/6 8 2-6,6-2,3-5,5-3,4-4 5 5/36 9 3-6,6-3,4-5,5-4 4 1/9 10 4-6,6-4,5-5 3 1/12 11 5-6,6-5 2 1/18 12 6-6 1 1/36 13 none 0 0 Table 1: Probabilities for rolling two dice. And we also have the rule that the sum of the probabilities for each outcome must equal one (something does occur!). N P i = 1 (3) i=1 Note that this is for the case when each outcome is equally likely (every number on each die is equally likely to arise). A more formal way to include the possibility that each outcome has varying probability is to begin with probabilities. The probability for one die to have one particular outcome is 1 6. The probability for two dice to have one particular outcome is P 1 (i)p 2 (j) = 1 6 1 6 = 1 36. (This can easily be verified by examining the table above). Then the probability for one particular outcome, say the probability to roll a 5 is given by summing the probabilities of obtaining a 5. Prob: 1/36 + 1/36 + 1/36 + 1/36 = 4/36 = 1/9 [die 1, die 2] [2,3] [3,2] [4,1] [1,4]. Two rules for mutually exclusive events. When a statement says or the probabilities are to be added. What is the probability to roll a 4 or a 5 with two dice? The answer is P(4) + P(5) = 1/12 +1/9 = 7/36 When a statement says and the probabilities are to be multiplied. What is the probability to roll a 4 and at the same time flip a coin and get heads? The answer is the product of the two individual probabilities. (You can verify this by creating a table like we did for the two dice). P(4) P(H) = ( 1 6 ) (1 2 ) = 1 12. The above rules are to be applied when the two events are mutually exclusive. They are also statistically independent, the outcome for one die does not depend upon the outcome of the other. We will deal shortly with scenarios where the events are neither mutually exclusive nor statistically independent. EXAMPLE: One more example to demonstrate this. Say a shifty gambler loads a pair of dice by shaving parts of it to skew its weight distribution. In so doing the probability of a 1 or 6 is increased to (1.2)/6, What is the probability that the pair will come up 7 with these dice? First we must find the probability for the other outcomes to occur (assumed to be equal). Since the total probability for a roll is 1 we have, (1.2) 1 6 + p 6 + p 6 + p 6 + p 6 + (1.2) p 6 = 1 [1][2][3][4][5][6] where p is the probability for a 2 through 5 to come up. Simplifying we have 2.4+4p = 6. Solving, we have p = 0.9. So in order to get a 7 we have, 1.44/36 + 1.44/36 + 0.81/36 + 0.81/36 + 1.08/36 + 1.08/36 = 6.66/36 [1,6] + [6,1] + [2,4] + [4,2] + [1,5] + [5,1] Which is now 18.5% as opposed to 16.7% with fair dice. 6

Axioms of Probability and some theorems and definitions A probability system (or probability space) consists of three items, Ω = (S, E, P): S: A basic space S of elementary outcomes (elements) ξ. E: A class E of events. (This is a subset of S). P: A probability measure P( ) defined for each event A in the class E and having the following properties: For events A and B in E we have (P1) P(S) = 1. (Probability for one outcome of the whole set to occur is 1). (P2) P(A) 0. (Probability for any one event is nonnegative). (P3) If A B =, then P(A B) = P(A) + P(B). (P4) If A 1, A 2,... is a sequence of pairwise incompatible events in E, i.e., A i Aj = for i j then n P( A i ) = i=1 n P(A i ) i=1 (The probability of the union of a set of mutually exclusive events is the sum of the probabilities of the individual events.) Some other theorems i P( ) = 0 ii For every event A, P(A) + P( A) = 1. iii P(A B) = P(A) + P(B) P(A B). (If A and B are not mutually exclusive). A definition, If event E is an event with positive probability, the conditional probability of the event A, given E, written P(A E), is define by the relation P(A E) = P(A E) P(E) Note that if events A and B are statistically independent we have P(B A) = P(B), and then P(A B) = P(A)P(B). EXAMPLE: Let s go back to the case of a fair pair of dice and see if we can tell whether they are statistically dependent. Lets find the probability that die one comes up as a one and dice two comes up as 5, that is we want to find P(X 1 = 1 X 2 = 5), where X i is the random variable representing the value of die i. Employing our formula we have, P(X 1 = 1 X 2 = 5) = P(X 1 = 1)P(X 1 = 1 X 2 = 5) = 1 6 ( 1 36 ) 1 = 1 36 = P(X 1 = 1)P(X 2 = 5). (4) 6 The dice are statistically independent as we expect. Joint probability To make the above rules a little more formal we define a random variable X which can take on values within a set σ X. This random variable can be seen as the limit of the low numbered events described before (the students in the school example) to infinite number of events. Define another random variable Y which takes on values in a set σ Y. We define the joint probability distribution P(x, y) = P(X = x&y = y) = P(X = x Y = y) as the probability for X to equal x and Y equal to y. The marginal probability distribution for X, P 1 (x), is defined the probability for X = x regardless of what Y is, P 1 (x) = y P(x, y) Likewise P 2 (y) = x P(x, y). 7

Probability Density Distribution The above discussion was for results which have a finite number of possible outcomes. What if we deal with a situation in which the total number of possible outcomes is infinite? For example, say you throw a point particle down on a line, what is the probability that the point lands at position x? If you follow the same procedure as above you would find that the probability for it to land anywhere is zero. How can that be? Summing up all of the possibilities would give zero, not one, but yet we know it ends up at one particular point. Consider a 12 sided dice. The probability for any one side to come up is now 1 12 as opposed to 1 6 for a six sided dice. We know that the probability for any one side to come up is 1 N where N is the number of sides. What if we consider a 1000 sided dice? A 1 10 8 sided dice? The probability for any one side to come up decreases. A sphere can be considered as an infinite sided dice. Then we have that the probability for the point x on the sphere to be at the top (comes up) is, 1 P(x) = lim N N = 0 To make sense of this we introduce a new quantity, the probability density distribution. Since the probability for any one point on the sphere to come up we ask instead what the probability for a range of results near x to come up. To simplify the discussion lets return to the case of throwing a point onto a line. The probability for the point to land in a region near x, (xtox + x), is now finite, it is simply the length of the region, x, divided by the total possible length L, P(x, x + x) = x L. We define the probability density distribution to be the quantity when multiplied by the length of the region gives the probability for those events to occur. We see that if we take are region to zero (find the probability to land at a point) we get zero just as before. In the case of the spherical dice, we see that the probability distribution is simply the area of the sphere. In both of these cases the probability distribution is uniform throughout the domains, (each point is equally likely to come up), p(x) = constant. (We will label probabilities with a capital P and probability densities with a lower case p). In general however the probability density distribution can vary from point to point. The rule governing the proper definition of the probability density distribution is that summing all possible outcomes should give 1. P = 1 = p(x) x xover all domain For the case of a sphere we simply have, P = Aover sphere 1 4πR 2 A = 1 4πR 2 Aover sphere A = 4πR2 4πR 2 = 1 [Aside: Of course to do this properly we need to use calculus. Then the above rule would become, p(x)dx = 1 over entire region and the rule to find the probability within some region would be, P(x, x + L) = x+l x p(x )dx Note that in general, probability density distributions do not need to be less than 1, though they are positive.] Expectation value Another useful quantity to define is the expectation value of a random variable X. This can be thought of as the outcome which is most likely to arise, or simply the mean value. We indicate two notations often used, the first is usually employed by physicists, the second by mathematicians. X = E(X) = xp(x = x) for all x For the case of a pair of dice we can find the expectation value for the sum of two fair dice, 2(1/36) + 3(1/18) + 4(1/12) + 5(1/9) + 6(5/36) + 7(1/6) + 8(5/36) + 9(1/18) + 10(1/12)+11(1/18) +12(1/36) = 252/36 = 7. Thus 7 is the most likely outcome which is what we expect. For continuous variable systems the expectation value is defined as, x = E(x) = xp(x)dx domain D 8

Variance, Covariance, and Correlation To complete the discussion we introduce three more quantities. The first is the variance, var(x) = E((X E(X)) 2 ) = E(X 2 ) E 2 (X). (The square root of the variance is the standard deviation). The variance is a measure of how far about the mean the distribution is spread. If the variance is zero, then the distribution is only at the mean. In terms of statistics, the data all lie at the mean value. The larger the variance the more spread out the distribution about the mean value. The covariance of a pair of random variables is defined as the expectation of the product of the deviations from each mean, X, Y = cov(x, Y ) = E[(X E(X))(Y E(Y ))]. The covariance is a measure of how two random variables are related. If the covariance is zero then the two random variables are not related. The larger the covariance the more related the two random variables are. In addition there can be a negative covariance, which indicates that the two random variables are inversely related. An easier way to understand the covariance is to examine the normalized covariance that limits the range from -1 to +1. This is the correlation between two random variables. The correlation coefficient, ρ(x, Y ) is defined as, ρ(x, Y ) = cov(x, Y ) var(x)var(y ) The correlations value is limited to the range between -1 to +1. If one random variable is completely determined by another then the correlation is +1. If one random variable is determined to be exactly the opposite of another (such that if one variable is large the other is small) then they are completely anticorrelated and ρ = 1. And lastly if the two random variables are independent then the correlation is zero. 9