MA 751 Probability basics Probability and Measure Theory 1. Basics of probability 2 aspects of probability: Probability = mathematical analysis Probability = common sense
Probability basics A set-up of the common sense view: Define as experiment any event with an outcome. Example 1: Toss of a die Example 2: Record information on deaths of cancer patients. Example 3: Measure daily high body temperature of a person on chemotherapy Example 4: Observe the next letter in a chain of DNA:
Probability basics ATCTTCA Ä? Given a well-defined experiment we must know the set of all possible outcomes (this must be defined by the experimenter) Collection of possible outcomes is denoted as W œ sample space. (sometimes also called a probability space and written HÑ
Probability basics This is the collection of all possible outcomes of the experiment Example 5: Die toss: W œ Ö"ß #ß $ß %ß &ß ' Example 6: Cancer outcome records:
Probability basics 4 outcomes: RPßVPßRHßVH Vœreceived treatment; Pœlived; Hœdied Rœno treatment;
Probability basics Example 7: High temperature measurement WœÖ>À> a real number Many other outcome (sample) spaces W are possible, for example, profiles of functions of DNA regions So: Have practical situations and for each one we define the sample space W of possible outcomes Def. 1: For sets Eß F: Define E Fif Eis a subcollection of F. If I Wß I is an event
Probability basics Example 8: Die tossing experiment: if I œ even roll œ Ö 2, 4, 6 Ö 1, 2, 3, 4, 5, 6 then I is an event. Why an event? Intuitively, an event means something that has occurred, and above the event E œ Ö#ß %ß ' represents the occurrence of an even number
Probability basics Again can translate between mathematical definition (subset) and intuitive notions of meanings of words (event). Probabilist wants to assign a probability TÐIÑ between! and ") to every event I. (a number Thus, e.g., if I œ Ö event of an even roll œ Ö#ß %ß ' want TÐEÑ œ " # [Rationales can vary] So: Ideally, want to assign numbers (probabilities) to subsets I W.
Example 9: Probability basics Die toss revisited: TÐ"Ñ œ " ' TÐ#Ñ œ " ' TÐ$Ñ œ " ' TÐ'Ñ œ " '
Probability basics " ' Each component in W has probability.
Probability basics Any subset E W has probability determined by adding measures of component subsets. Want TÐWÑ œ " ( why?) E 3
Probability basics Example 10: If we want to predict next base in genomic sequence, we have the sample space W œöeßxßgßk Define Iœevent that next base is a purine, i.e., IœÖEßK W. If we expect all bases have equal probabilities, then T ÐIÑ œ "Î#Þ
Analytic probability 2. Sample space and probability measure: what properties do we want? Def. 2: Recall for sets Eß Fß define E F and E F as the union and intersection of the sets.
Analytic probability Desired properties of T : (1) If E 3 are disjoint (i.e., no pair of them intersects) then the probability of the union of the E 3 is the sum of their probabilities: If 9 œ empty set: TŠ. E œ TÐE Ñ 3œ" 3 3 3œ" (2) TÐ9Ñ œ!ß TÐWÑ œ "Þ (Linearity)
Analytic probability An assignment of numbers TÐIÑ to subsets (events) I W satisfying properties (1) and (2) is called a measure on the set W. However, we will sometimes need to restrict the collection of sets I to which we can assign a probability (measure) TÐIÑ. Ê need a collection Y of subsets of W whose probabilities we are allowed to measure.
Analytic probability This collection of subsets Y must have some specific properties: Def. 3: The notation I Y means that the object Iis in the collection Y
Analytic probability Def: 4 a collection Y of subsets of W is a 5-field if (i) W, 9 Y ( W and 9 are in collection Y) - - (ii) E Ê E Y ( E œ complement of E) (iii) E Y a3 Ê E 3 3 3 [ aœfor all ] Y where EßEßEßá " # $ form a sequence of sets.
Analytic probability Once we have an W, Y, and a T as above, we call the triple ÐWß Yß T Ñ œ probability space We denote the class Y to be the measureable sets in W.
Analytic probability Example 11: W œ Ö"ß #ß $ß %ß &ß '. In this case the collection Y of measureable sets is usually all subsets of Wß i.e., Y œ Öall subsets of W} Example 12: WœÒ!ß"Ó, i.e., we are choosing a random number B in the interval Ò!ß "Ó; W is the set of all possible outcomes. Here Y œ collection of measureable sets is defined to be the smallest 5-field of sets including all possible intervals Ð+ß,Ñ.
Analytic probability Sometimes this collection Y is denoted as the Borel sets. We can define probabilities of sets to be, e.g. TÐ+ß,Ñ œ, + This defines the probability that our random number B is in the interval Ð+ß,Ñ. It can be shown that we can uniquely extend this definition of probability from the collection of intervals Ð+ß,Ñ to the larger collection of sets Y.
Analytic probability This full measure on the interval Ò!ß "Ó is called Lebesgue measure.
Measurable functions 3. Some basic notions: measurable functions If we have a sample space W on it. we want to define functions Definition 5: We define to be the collection of all real numbers. Definition 6: A function 0 on any set I is rule which assigns to each element B Ia real number denoted as 0ÐBÑ. In this case we write 0ÀIÄ
Measurable functions Assume now we have a collection of measurable sets Y on H Definition 7: A function 0ÀWÄ is measurable if for all B, the set Ö+ À 0Ð+Ñ Ÿ B œ the set of all + such that 0Ð+Ñ Ÿ B is measurable (i.e. is in Y) Þ Example 13: Consider the function 0À Ä $ 0BÑœB ( such that
Measurable functions Example 14: Consider 0ÐBÑ œ MUÐBÑ œ indicator function of rationals, defined by 0ÐBÑ œ " if B is rational œ! if B is irrational.
Random variables 4. Random Variables Definition 8: A random variable \ is any measurable function from the sample space W to the real numbers. Example 15: throw 2 dice
Random variables Suppose random variable \ maps outcomes to numbers.
Random variables Thus if the outcome is Ð$ß &Ñ (i.e. first die is a 3 and second die is a &) then \Ð$ß &Ñ œ ), i.e., \ maps each outcome to the total. Example 16: For a given amino acid + in a protein sequence, let \Ð+Ñ denote the number of other amino acids which + is bound to. Again \ maps the amino acid (in W) into a number. For a random variable (RV) \:
Random variables Def. 9: If \ is a random variable, we define its distribution function J to be distribution function ß J ÐBÑ À T Ð= À \Ð= Ñ Ÿ BÑ œ T Ð\ Ÿ BÑ
Properties of Random variables J Ðeasily derived) (i) JÐBÑ Ä " as B Ä JÐBÑ Ä! as B Ä (ii) J has at most countably many discontinuities i.e., discontinuity points BßBßá " # can be listed Example 17: Suppose we record high, low body temperatures for a patient on a given day; form a sample space of all possible pairs of measurements:
Random variables W œ ÖÐLßPÑ À L P Define a random variable \ÀWÄ as follows: for each element of W, let \ÐLßPÑ œ L P œ temperature range
Random variables Suppose we find that. J ÐBÑ œ T Ð ÐLß PÑ À ÐL PÑ Ÿ BÑ œ TÐ\ Ÿ BÑ œ " / B if œ B!! if B!
Random variables If J has a derivative, w JÐBÑœ0ÐBÑœ can check this is a d.f. density function of \. Example 18: here density œ0ðbñ œ / B œ!
Example 19: Random variables check: J ÐBÑ œ ( 0ÐB Ñ.B Þ Normal: density is B " # B Î# È / # 1 w w
1. Conditional probability More probability: Example 1: Die roll:
E œ event of even roll œ Ö#ß %ß ' F œ event of roll divisible by 3 œ Ö$ß ' TÐEÑ œ Define conditional probability " " à TÐFÑ œ # $ P(B A) œ Prob. of B given A occurs How to compute TÐFlEÑ? Create reduced sample space consisting only of E
and compute what portion of E is in F. Here reduced sample space E œ Ö#ß %ß '. Assuming equal probabilities: T Ð#lEÑ œ T Ð%lEÑ œ T Ð'lEÑ œ "Î$Þ Note T ÐFlEÑ œ T Ð'lEÑ œ "Î$Þ
Now note TÐE FÑ œ " ' ; thus " " ' TÐE FÑ TÐFlEÑ œ œ œ Þ $ " TÐEÑ Illustrates general mathematical definition TÐFlEÑ œ TÐE FÑ TÐEÑ #
2. Independence: We say events F and E are independent if TÐF EÑ T ÐFlEÑ œ T ÐFÑ Í œ T ÐFÑ TÐEÑ Í TÐE FÑ œ TÐEÑTÐFÑ In general a collection of events EßáßE " 8 is independent if for any subcollection E ßáßE TÐE 3 3 " 8 E á E Ñ œ TÐE ÑTÐE ÑâTÐE Ñ 3 3 3 3 3 3 " # 8 " # 8 À
Example 2: Consider sequence BßBßáßB " # 5 forming gene 1. of bases Q: Are the successive bases BßB 3 3 " independent? For example, given we know the base B3 œ Aß does that change the probability TÐB œ C). More specifically, do we have 3 " TÐB3 " œ ClB3 œ AÑ œ TÐB3 " œ C Ñ? (3) If (3) also holds for all possible choices ÐC,G,T) replacing A and all choices replacing C, then B3 " and B3 are independent.
Expect: in actual DNA exon region has more dependence from base to base than an intron does. Why? Evolutionary structural pressures - exons are more important to survival
Independence of RV's 3. Independence of RV's Henceforth given a subset E of sample space W, we assume E is measurable unless stated otherwise. Definition 10: Let W be a sample (probability) space. Let \ÀWÄ be a random variable (i.e., a rule assigning a real number to each outcome in W) If EœÒ+ß,Ó ß define TÐ\ EÑœTÐ+Ÿ\Ÿ,Ñ
Independence of RV's œtð all outcomes = whose value \Ð=Ñ is between + and,ñ œtð=l+ÿ\ð=ñÿ,ñþ Recall: Definition 11: Let W be a sample space, and \" ß\ # be random variables. Then \ß\ " # are independent if for any sets E, E, " # T Ð\ E ß \ E Ñ œ T Ð\ E ÑT Ð\ E Ñ " " # # " " # #
Independence of RV's Generally \ßáß\ " 8 TÐ\ E ßáß\ are independent if E Ñ " " 8 8 œ T Ð\ E ÑT Ð\ E Ñá T Ð\ E Ñ " " # # 8 8
Independence of RV's More to come on these topics-