MATH2206 Prob Stat/20.Jan Weekly Review 1-2

Similar documents
Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).

STA Module 4 Probability Concepts. Rev.F08 1

Test One Mathematics Fall 2009

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Business Statistics. Lecture 3: Random Variables and the Normal Distribution

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events

ECE353: Probability and Random Processes. Lecture 2 - Set Theory

the time it takes until a radioactive substance undergoes a decay

STAT Chapter 3: Probability

Topic -2. Probability. Larson & Farber, Elementary Statistics: Picturing the World, 3e 1

Mathematical Probability

Week 2. Section Texas A& M University. Department of Mathematics Texas A& M University, College Station 22 January-24 January 2019

Introduction to Probability Theory, Algebra, and Set Theory

Chapter 4: An Introduction to Probability and Statistics

If S = {O 1, O 2,, O n }, where O i is the i th elementary outcome, and p i is the probability of the i th elementary outcome, then

Chapter 1 (Basic Probability)

Problems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman.

Math 3338: Probability (Fall 2006)

Statistics 251: Statistical Methods

Probability (Devore Chapter Two)

POL502: Foundations. Kosuke Imai Department of Politics, Princeton University. October 10, 2005

Introduction to Probability Theory

Module 1. Probability

Fundamentals of Probability CE 311S

Chapter 6: Probability The Study of Randomness

ELEG 3143 Probability & Stochastic Process Ch. 1 Probability

P (A B) P ((B C) A) P (B A) = P (B A) + P (C A) P (A) = P (B A) + P (C A) = Q(A) + Q(B).

Chapter 8: An Introduction to Probability and Statistics

Notes Week 2 Chapter 3 Probability WEEK 2 page 1

God doesn t play dice. - Albert Einstein

Chapter 2 Class Notes

Probability and Independence Terri Bittner, Ph.D.

4. Probability of an event A for equally likely outcomes:

Probabilistic models

Probabilities and Expectations

Sample Space: Specify all possible outcomes from an experiment. Event: Specify a particular outcome or combination of outcomes.

2. Probability. Chris Piech and Mehran Sahami. Oct 2017

STT When trying to evaluate the likelihood of random events we are using following wording.

Probability. Lecture Notes. Adolfo J. Rumbos

CS 246 Review of Proof Techniques and Probability 01/14/19

1 Probability Theory. 1.1 Introduction

Probability 1 (MATH 11300) lecture slides

Conditional Probability

Formalizing Probability. Choosing the Sample Space. Probability Measures

Lecture 10: Everything Else

Probability Experiments, Trials, Outcomes, Sample Spaces Example 1 Example 2

Lecture 3 Probability Basics

MA : Introductory Probability

MATH MW Elementary Probability Course Notes Part I: Models and Counting

Lecture 1. Chapter 1. (Part I) Material Covered in This Lecture: Chapter 1, Chapter 2 ( ). 1. What is Statistics?

MATH 556: PROBABILITY PRIMER

MAT2377. Ali Karimnezhad. Version September 9, Ali Karimnezhad

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

Discrete Probability. Chemistry & Physics. Medicine

Section F Ratio and proportion

Introduction to Probability

SET THEORY. Disproving an Alleged Set Property. Disproving an Alleged Set. Example 1 Solution CHAPTER 6

Probabilistic models

Homework 1 2/7/2018 SOLUTIONS Exercise 1. (a) Graph the following sets (i) C = {x R x in Z} Answer:

Notes 1 Autumn Sample space, events. S is the number of elements in the set S.)

Statistics 1L03 - Midterm #2 Review

Ch 14 Randomness and Probability

CISC 1100/1400 Structures of Comp. Sci./Discrete Structures Chapter 7 Probability. Outline. Terminology and background. Arthur G.

Discrete Probability. Mark Huiskes, LIACS Probability and Statistics, Mark Huiskes, LIACS, Lecture 2

Measures. 1 Introduction. These preliminary lecture notes are partly based on textbooks by Athreya and Lahiri, Capinski and Kopp, and Folland.

Math 1313 Experiments, Events and Sample Spaces

3 PROBABILITY TOPICS

7.1 What is it and why should we care?

Probability COMP 245 STATISTICS. Dr N A Heard. 1 Sample Spaces and Events Sample Spaces Events Combinations of Events...

3.2 Probability Rules

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

Elements of probability theory

LECTURE NOTES by DR. J.S.V.R. KRISHNA PRASAD

Probability- describes the pattern of chance outcomes

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Probability: Terminology and Examples Class 2, Jeremy Orloff and Jonathan Bloom

ECE 340 Probabilistic Methods in Engineering M/W 3-4:15. Lecture 2: Random Experiments. Prof. Vince Calhoun

Probability and distributions. Francesco Corona

Review of Basic Probability

PROBABILITY THEORY 1. Basics

COMP Logic for Computer Scientists. Lecture 16

Toss 1. Fig.1. 2 Heads 2 Tails Heads/Tails (H, H) (T, T) (H, T) Fig.2

Brief Review of Probability

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability

Topic 5 Basics of Probability

MS-A0504 First course in probability and statistics

PROBABILITY CHAPTER LEARNING OBJECTIVES UNIT OVERVIEW

CMPSCI 240: Reasoning about Uncertainty

Review of Basic Probability Theory

Chapter 2 Classical Probability Theories

Properties of Probability

Lecture 6 - Random Variables and Parameterized Sample Spaces

Probability: Sets, Sample Spaces, Events

Computing Probability

Sets. Alice E. Fischer. CSCI 1166 Discrete Mathematics for Computing Spring, Outline Sets An Algebra on Sets Summary

MATH STUDENT BOOK. 12th Grade Unit 9

Probability and the Second Law of Thermodynamics

MA554 Assessment 1 Cosets and Lagrange s theorem

Transcription:

MATH2206 Prob Stat/20.Jan.2017 Weekly Review 1-2 This week I explained the idea behind the formula of the well-known statistic standard deviation so that it is clear now why it is a measure of dispersion and concluded our discussion on descriptive statistics by the Chebyshev theorem. Then we moved our focus to probability theory, in order to prepare ourselves for the study of statistical inference in the second half of this course; we illustrated some counting techniques for elementary probability calculations, introduced formally the mathematical definition of probability, and demonstrated how to derive further theoretical results from the definition by rigorous mathematics. I explained that although the range (which in Statistics, unlike in everyday language, refers to the numerical value [one single value] telling us the difference between the maximum and the minimum, rather than an interval indicating from where to where) is a simple measure of dispersion, the standard deviation, which takes every datum into consideration, is a much more popular and useful measure of dispersion. Considering how concentrate the data are around their centre (the mean), I argued by our intuition to establish (step by step) the formula of the variance, which is the mean (to summarize) of the squared (to get rid of the sign) deviations from the mean (to see how close to the centre each datum is). Taking the square root of the variance will give us the standard deviation, which has the same unit of measurement as the original data. In the calculation of standard deviation, we have to know whether the data themselves are already the population (the largest collection of entities for which we have an interest at a particular time) or only a random sample (a collection of entities from the population so that each entity has the same chance to be chosen). And what we are going to learn in the rest of the course (and the rest of the book) is the so-called Statistical Inference, where Statistical inference is the procedure by which we reach a conclusion about a population on the basis of the information contained in a sample that has been drawn from that population. The very first thing about Statistical inference we learnt is to estimate parameters of a population by the corresponding statistics calculated from a sample. For example, if we have a very large population, whose mean and standard deviation are µ and σ, respectively, and we want to estimate these two parameters, what we have to do is to take a random sample of size n, say {x 1, x 2,..., x n }, from the population and then calculate the corresponding statistics, namely, the sample mean x and the sample standard deviation s by x = x 1 + x 2 + + x n, s = n (x1 x) 2 + (x 2 x) 2 + + (x n x) 2. n 1 1

The reason for using n 1 in the denominator of s 2 is that this will give a better estimate for σ 2 in the sense that when there are many researchers, if each researcher has his/her own random sample from the same population and obtain his/her own s 2, then the theoretical average (loosely speaking, the average of infinitely many) of all these s 2 is equal to the true σ 2. Technically speaking, we say s 2 is an unbiased estimator for σ 2. (However, s is not an unbiased estimator for σ. The technical details of such terms will be discussed not in this course but in a second semester course that is devoted to mathematical statistics.) Even we now understand that the standard deviation is a measure of dispersion, what exactly do we know if say, µ = 50 and σ = 10? The beautiful Chebyshev s inequality (see p. 114) states an amazing result that Theorem 1 (Chebyshev) For any set of data and any constant k > 1, the proportion of the data that must lie within k standard deviation on either side of the mean is at least 1 1/k 2. Therefore, if µ = 50 and σ = 10, then we know e.g. that at least 75% of the data are between µ ± 2σ = 50 ± 20. Thus, only the mean and the standard deviation can already tell us a lot about the distribution of the data. Remind you that we, of course, could not know every detail of the given data from these two numbers, which serve merely as a summary. However, knowing that e.g. at least 75% of the data are not more than 2 standard deviations (and e.g. at least 89% not more than 3 standard deviations) away from the mean seems quite helpful in having an overall picture of the distribution of the data. (Note that this course will not present any mathematical proofs. We are not learning proofs in this course; we are learning the applications of statistics, which do not require the rigorous proofs behind the theory. For students majoring in mathematics, it is a very interesting experience to learn how to accept mathematical results without seeing their proofs.) To study statistical inference, we first have to have a proper understanding of elementary probability theory, which is the theme of Chapters 4 6, and that s why we shifted from statistics to probability after the discussion of descriptive statistics. As you will see very soon, the calculation of probability of an event is not always straightforward. One major difficulty is how to count the number of all possible outcomes and the number of individual outcomes comprising a particular event. A very important technique in elementary probability is how to count the number of possible ways to choose r balls from n distinct balls (e.g. balls labelled by distinct numbers). If the order of the chosen balls is important, e.g. {1, 2, 3}, {2, 1, 3}, {3, 1, 2}, etc., are different 2

outcomes, the number of possible permutations (outcomes that the order matters) is np r = n! (n r)!. If the order of them is irrelevant, e.g. {1, 2, 3} is the same as {2, 1, 3}, {3, 1, 2}, etc., then the total number of possible combinations (the order does not matter) is given by ( ) n n! nc r = = r r!(n r)!. Using the first, the second and the third prize of our local lottery Mark Six, I illustrated how to count possible combinations systematically. For example, if we have to pick up r balls from n distinct balls in which a of them are red in colour (representing say the a = 6 drawn numbers in Mark Six) and the rest of them are green (representing the remaining 43 balls), the number of ways we can get x red balls, 0 x r, when the order does not matter, is ( )( ) a n a. x r x This idea can be easily extended to more than two colours. For example, if we have a red balls (again, representing say the a = 6 drawn numbers in Mark Six), b blue balls (representing the b = 1 extra drawn number) and n a b green balls (representing the c = 42 untouched balls left in the original big pool), then the number of ways to get x red balls and y blue balls by chosen r( x + y) balls is ( )( )( ) a b n a b. x y r x y In some applications perhaps x or y is zero, or x + y = r; in these cases I recommend that you still write all terms down, including e.g. ( a 0) when x = 0, in the above expression so that you will not miss any colour by mistake. (And note the convention that 0! = 1.) In this lottery story we say we sample without replacement. We can make a totally different story if we consider sampling with replacement, meaning that the picked ball would be replaced by a new but identical ball (or we simply put the picked one back to the pool) so that the same number (the same ball) could appear more than once. Problems related to sampling with replacement will appear in Chapter 5. We want to count the number of possible combinations because we want to calculate probability. But, wait, what actually is probability? The interpretation we may use comes from the law of large numbers (see p. 155), which says that Theorem 3 (Law of Large Numbers) If an experiment is repeated infinitely many times, the probability of an event is equal to the relative frequency of the occurrence of the event. 3

An essential point in any probabilistic statement is that the experiment can be repeated. Thus, the statement the probability that it will rain tomorrow is 0.3 is no longer a statement that we can understand, because tomorrow will appear only once and cannot repeat or be repeated. Such a subjective probability, however, is very commonly used in everyday life. We will give it a rigourous meaning when we move to Chapter 5. There is no short-cut to learn how to count. In fact, there does not exist a general procedure for counting the number of individual outcomes comprising a particular event. As a result, counting is always challenging. The only way to master the technique is to have more practice. Questions in Assignment 1 and the next example class will give you more chance to practice the art of counting. The most important ability that you have to develop through these questions is to know how to count in a systematic way. However, probability is not only about counting the number of possible combinations, because each outcome is not necessarily equally likely. For example, if we roll a die, a fair six-faced die will lead to a probability function that assigns 1/6 to each possible outcome. But as I showed you, dice are not necessarily fair. If a die is not a cube but a general cuboid (or even a U-shape object), then each outcome is not equally likely. More information on such dice can be found in http://www.riemer-koeln.de/joomla/index.php?option=com wrapper&view=wrapper&itemid=56 Sprechen Sie Deutsch? ( = Do you speak German?) If you don t, then before you learn it, you may read the following article written in English: http://www.riemer-koeln.de/mathematik/quader/cuboid.metrika.pdf which reports a scientific study of such dice. Now, consider a question from the reverse direction: instead of calculating the probabilities of the outcomes of a unfair die, we consider the feasibility of making a die that meets our pre-specified probabilities of the outcomes. For example, can we make a loaded die such that each even number has a chance of 1/4 and each odd number has 1/12? More generally, for some arbitrarily assigned probability values, can we manufacture a die that has the assigned probabilities (as the proportions of outcomes in the long run)? To answer this question, we have to introduce some more mathematical terms. Denote by Ω (or, as used in elementary textbooks, S) the sample space of an experiment, meaning that Ω is the collection of all possible outcomes of the experiment. An event A is a subset of Ω, denoted by A Ω, i.e. A is a collection of some possible outcomes in Ω. A function Pr( ) defined for all events is called a probability function, or simply probability, if it satisfies 1. Pr(A) 0 for all events A; 2. Pr(Ω) = 1; 4

3. for mutually exclusive events A 1, A 2,..., Pr(A 1 A 2 ) = Pr(A 1 ) + Pr(A 2 ) +, where A i and A j are mutually exclusive if they have nothing in common, i.e. A i A j =, in which the symbol denotes the empty set, i.e. a set containing nothing. (The symbols and represent or and and, respectively, also known as union and intersection.) From mathematical point of view, whenever you give me a probability function that satisfies the above three conditions, it is possible to have a physical die whose outcomes follow your probability function. (How to make this die is an engineering problem and hence is none of our business.) If you give me a function that does not satisfy the three conditions, no one can make a die that have such probabilities for the outcomes. Using simple mathematical argument, we derived the following four properties of a probability function: (i) Pr(A) 1 for all events A; (ii) Pr(A c ) = 1 Pr(A), where A c denotes the complement of A (note that the event A does not happen if and only if A c happens, and so Pr(A c ) is the probability that A does not happen); (iii) Pr( ) = 0; (iv) the General Addition Rule given on p. 181, Pr(A B) = Pr(A) + Pr(B) Pr(A B). The General Addition Rule is quite easy to believe if we draw a Venn diagram. However, Venn diagrams cannot replace rigorous mathematical proofs for probability statements. We proved it mathematically by a simple technique, namely, split an event into two (or more) mutually exclusive (i.e. disjoint) parts and then apply the third condition of a probability. The General Addition Rule of course can be generalised to the case for the union of k events. The statement can be easily written down by using Venn diagrams and then proved by mathematical induction (if you know what mathematical induction is, then good; but if you do not, it does not matter for our course). The reason for considering the weird event in (iii) above is that when we have two mutually exclusive events, say A and B, the probability that both of them happen is of course zero (by the definition of mutually exclusive events), i.e. Pr(A B) = 0; mathematically, the fact that A and B are mutually exclusive is expressed as A B =, and so it is intuitively obvious that a sensible definition of probability must result in Pr( ) = 0. 5

Please be reminded that the purpose of the above discussion is to understand probability by mathematically rigourous arguments instead of by intuitive arguments (nevertheless, the mathematical results should agree with our intuition). Probability theory, as a branch of mathematics, starts with definitions (sample space and events) and axioms (the three conditions). Then theorems (such as the four properties presented in this and the last review) are derived from definitions and axioms by mathematical arguments. However, this course is not supposed to be too mathematical and so we only illustrate the ideas but we will not really do the theorem proof stuffs. Then, we introduced the notion of conditional probability, which is as follows. Suppose B has happened or definitely will happen, and we are interested in the occurrence of event A, then the probability of A, given B is Pr(A B) = Pr(A B), if Pr(B) 0. Pr(B) However, sometimes it may be more helpful to express the relationship as Pr(A B) = Pr(A B) Pr(B), which can be understood as follows: the chance that both A and B will happen can be split into two probabilities; one is the chance of A if B happens and the other is the chance that B really will happen. This relationship is still true even if Pr(B) = 0. For some experiments, event B has nothing to do with event A and vice versa, and so Pr(A B) = Pr(A) and, equivalently, Pr(B A) = Pr(B). For such a pair of events, we say they are independent. Note that for events A and B with non-zero probabilities, i.e. Pr(A) > 0 and Pr(B) > 0, if they are independent, they are not mutually exclusive and if they are mutually exclusive, they are not independent. That is to say, for two events with non-zero probabilities, the event that they are independent and the event that they are mutually exclusive are mutually exclusive! (But be careful: there are events A and B that are neither independent nor mutually exclusive.) When will two events be independent? If we are talking about the outcomes of two experiments, e.g. rolling a die twice (or rolling two dice), then independence is often an assumption. If we are talking about two events of one experiment, then we can prove or disprove the independence. For example, suppose we roll a fair die and consider A is the event that the outcome is 1, 2, 4 or 6, and B is the event that the outcome is 1, 3 or 4. Then because Pr(A) = 2/3, Pr(B) = 1/2 and Pr(A B) = 1/3, we have Pr(A B) = Pr(A) Pr(B), and so A and B are by definition independent. No intuitive explanation is available: they are just happened to be independent! Next week we derive some further properties of a probability function, introduce and discuss some interesting applications of the notion of conditional probabilities, and then do some gambling (without money...). Afterwards, we will move to Chapter 5 for more 6

advanced topics in probability, which can be viewed as abstractions and generalisations of some concepts and notions in gambling. Enjoy your Assignment 1. Cheers, Heng 7