Tutorial for Lecture Course on Modelling and System Identification (MSI) Albert-Ludwigs-Universität Freiburg Winter Term 2016-2017 Tutorial 3: Emergency Guide to Statistics Prof. Dr. Moritz Diehl, Robin Verschueren, Tobias Schöls, Rachel Leuthold, and Mara Vaihinger In this tutorial we will recall basic concepts from statistics, which are very important for modelling and identifying a system. To do so, let s begin with a fundamental understanding of the term probability and how to calculate probabilities. Later we will have a look at the analysis of eperiments with random outcomes. It is important that you get familiar with the concepts to be able to follow the lecture. So if anything is unclear to you, don t hesitate to ask. That s why we are here. 1 Probability Probability is a very useful concept to describe an uncertain situation. It can be interpreted in various ways, for eample as an epression of subjective belief, e.g. when we are talking of how sure we are that something is true. Another way of interpreting probability is to interpreted it as the frequency of occurence of an event, i.e. if you roll a fair dice many times each number occurs in average in 1/6 of all rolls. Before digging into the theory now, say hello to Ma again! Last week he made his way to the casino successfully. So let s join him and help him analyse his chances of winning in the casino. He likes to start with a simple game called Macao. It is a dice game in which the player tries to get as close as possible but not above a specific number (let s say 9). The game is similar to the card game Black Jack. Before Ma places his bet, he would like to know what his chances are to get eactly the value 9 with only two rolls of a fair dice (event A). Namely, how possible it is, that event A occurs. How can we compute this? First, think about all possible outcomes that can occur if you roll a fair dice twice: Possible outcomes : Ω = {{1, 1}, {1, 2},..., {6, 6}} = {{i, j} i, j {1, 2, 3, 4, 5, 6}} Let s call the set containing all possible outcomes of the eperiment rolling a dice twice the sample space Ω. Within these outcomes, how many fulfill the condition that the values add up to 9? Possible outcomes contained in event A:... So there are... elements of all... possible outcomes, that are contained in A. The chance that event A occurs is epressed with a numerical value, the probability P (A). This value can be computed as follows, taking into account that in this special case, all outcomes of the eperiment are equally likely: P (A) = amount of elements s i in A amount of elements in Ω =. (1) This means, that Ma has a chance of... to get eactly the value 9 with only two rolls of a fair dice. The function P (A) is called probability law. It assigns to a set A of possible outcomes (A Ω) a nonnegative number P (A) (probability of A), which describes how probable it is that event A occurs. A probability law satisfies the following aioms if A and B are disjoint sets (i.e. do not share any element). 1. Nonnegativity: P (A) 0 for every event A. 2. Additivity: If A and B are disjoint (mutually eclusive), then P (A B) = P (A) + P (B). 3. Normalization: P (Ω) = 1. From these aioms we can easily derive, that P ( ) = 0. Note that in the following we will sometimes denote P (A B) as P (A, B). What do these aioms mean in terms of the eample above? 1
In general it holds that, if an eperiment has a finite number of possible outcomes s i (S i Ω) we speak of a discrete probability law, which can be computed as follows: The probability of any event A = s 1, s 2,..., s n is the sum of the probabilities of its elements P (s 1, s 2,..., s n ) = P (s 1 ) + P (s 2 ) + + P (s n ), only if {s i s j } =, i, j : i j. (2) In the special case we ve seen in the eample, when all single elements s i have the same probability P (s i ), the probability of the event A is given by 1.1 Conditional Probability P (A) = amount of elements s i in A amount of elements in Ω. (3) The conditional probability describes the probability that the outcome of an eperiment, that belongs to the event B also belongs to some other given event A. For eample what the chance is that a person has a specific disease (event A) given that a medical test was negative (event B). The conditional probability of A, given B with P (B) > 0, is defined as P (A B) = P (A, B) P (B). (4) Eplained in words, out of the total probability of the elements of B, the probability P (A B) is the fraction that is assigned to possible outcomes that also belong to A. If the sample space Ω is finite and all outcomes are equally likely then P (A B) = amount of elements of A B amount of elements of B. (5) Now back to Ma: The game started, Ma rolled his dice once and got a 3 (event A). Compute the chance that he does not eceed the value 9 after rolling the fair dice again (event B). Determine the conditional probability P (B A) with A = {first roll is a 3}, B = {sum of two rolls is 9}. Repeat this task for the cases that he first gets a 1, 2, 4, 5 or 6 i.e. compute the conditional probability P (B A i ), with i = 1, 2, 4, 5, 6. Ma is interested to know what the total probability P (B) is, that he gets a value less than 9 with two rolls of the fair dice. He already knows the conditional probabilities P (B A i ) describing the chance of getting less than 9, when the first roll was a 1, 2, 3, 4, 5 or 6. What else does he need to compute P (B) using the conditional probabilities P (B A i )? Have a look at the following theorem: Total Probability Theorem Let A 1,..., A n be disjoint events that form a partition of the sample space (each possible outcome is included in one and only one of the events A 1,..., A n ), i.e. A i A j = i, j : i j and n i=1 A i = Ω and assume that P (A i ) > 0, for all i = 1,..., n. Then, for any event B, we have P (B) = P (A 1, B) + + P (A n, B) (6) n = P (A i, B) (7) i=1 = P (A 1 )P (B A 1 ) + + P (A n )P (B A n ) (8) n = P (A i )P (B A i ). (9) i=1 Compute the missing probabilities P (A i ). Then compute the total probability P (B) that he gets a value less than 9 with two rolls of the fair dice. 2
Ma is getting more and more involved in the probabiliy stuff and asks himself: Given that I get less than 9 within two rolls of the fair dice (event B). What is the probability that in my first roll I got a 5? Can you help him? Use the following theorem to answer his question. Bayes Theorem Let A 1,..., A n be disjoint events that form a partition of the sample space, and assume that P (A i ) > 0, for all i. Then, for any event B such that P (B) > 0, we have P (A i B) = P (A i)p (B A i ) P (B) (10) P (A i )P (B A i ) = P (A 1 )P (B A 1 ) + + P (A n )P (B A n ). (11) Independent events In the special case that the occurrence of an event A does not influence the probability that event B occured, those events are independent and the following relation holds P (A, B) = P (A)P (B). (12) If two events A and B do not satisfy this relation, they are dependent. Mutually eclusive events cannot be independent. Given this definition, decide whether the events A = {first roll is a 3} and B = {sum of the two rolls is 9} are independent or not. 2 Random Variables To further analyze the properties of an eperiment, it is useful to introduce a random variable that describes the outcome of the eperiment. Consider a single eperiment which has a set of possible outcomes, i.e. Ma playing a single round (rolling the fair dice twice). Then a random variable X is for eample the sum of two rolls. The random variable asserts a particular number to each possible outcome of this eperiment. For the outcome {1, 3}, what is the value of the random variable? =... We refer to this number as the numerical value or the eperimental value of the random variable and denote it in lower case letters, while a random variable is denoted in capital letters. Mathematically, a random variable is a real-valued function of the eperimental outcome. To get a better understanding of this concept let s have a look at another eample: Given an eperiment consisting of a sequence of 3 tosses of a coin, with possible oucomes {HHH}, {HHT}, {HTT}, {TTT}, {THH}, {TTH}, {THT} and {HTH}. Let the random variable X be the number of tails in the sequence. For the outcome {HHH} the value of the random variable is = 0. The sequence of heads and tails denoted in letters, can t be a random variable as it is no real-valued number. What is the value of the random variable for the outcome {THH}? =... Which of the following eamples can be described by a random variable? 1. The sequence of heads and tails denoted in letters, when tossing coins. 2. The number of false positives of a medical test 3. The names of the people in this room. 4. The percentage of students registered to MSI, which actually come to the class. Think of another eample that can be described by a random variable. 3
2.1 Discrete Random Variables A random variable is discrete if its range is finite, e.g. integers between 1 and 6. It can be described by an associated probability density function (PDF), denoted by p X, which gives the probability of each numerical value that the random variable X can take. The probability mass of is the probability of the event {X = }. Follow the steps below to compute a PDF: p X () = P ({X = }) (13) 1. Determine all possible outcomes. 2. Follow the two steps for each numerical value : (a) Determine all possible outcomes which lead to the event X =. (b) Add the probabilities of all these outcomes to obtain p X () = P ({X = }). The events {X = } are disjoint and form a partition of the sample space. From the addition and normalization aioms it follows p X () = 1 (14) P (X S) = S p X (). (15) Eample 1 Consider tossing a coin 3 times, the possible outcomes (HHH), (HHT), (HTT), (TTT), (THH), (TTH), (THT) and (HTH) are equally likely. Let the random variable X be the amount of heads in the sequence, then one possible outcome leads to = 0, 3 possible outcomes lead to = 1, 3 possible outcomes lead to = 2 and one possible outcome leads to = 3. So the PDF is 1/8 if = 0 3/8 if = 1 p X () = 3/8 if = 2 1/8 if = 3 0 otherwise Compute the PDF of the random variable X describing the sum of two rolls of a fair dice. Functions of random variables Consider a probability model of the size of 11-year old children in Germany, define the random variable X as the size in meters. The transformation Y = 0.3048X, gives the size of the children in inches. The transformation is affine and can be written in a general form as Y = g(x) = ax + b with parameters a and b. If g(x) is a function of a random variable X, then Y = g(x) is also a random variable, since it provides a numerical value for each possible outcome. This is because every outcome in the sample space defines a numerical value for X and hence also the numerical value y = g() for Y. The function g may also be a nonlinear function. If X is discrete with PDF p X, then Y is also discrete, and its PDF p Y can be calculated using the PDF of X. p Y (y) = p X () (16) { g()=y} Epected Value, Variance and Standard deviation Ma learned to describe the outcome of rolling a dice twice by a random variable. With a PDF, he can describe the characteristics of the random variable. It is convenient to summarize these characteristics with some quantities associated to the random variable. For eample the epected value and the variance of the random variable. The epected value of X, also epectation of X often denoted by µ X, is a weighted average of the possible values of X. The weights are the associated probabilities p X (): E{X} = p X (). (17) 4
Let g() be a real valued function and X be a random variable, then the epected value rule is defined as E{g(X)} = g()p X (). (18) Useful properties of the epectation operator are E{c} = c E{X + c} = E{X} + c E{X + Y } = E{X} + E{Y } E{cX} = c E{X} with c being a constant, and X and Y being random variables. Compute the epected value E{X} of the random variable X describing the result of rolling a fair dice. Another quantity to characterize a random variable is the variance, denoted as var(x) written as σx 2. It is a measure of dispersion of X around the mean E{X}, which is always nonnegative. Here g(x) = (X E{X}) 2. σ 2 X = E{(X E{X}) 2 } = ( E{X}) 2 p X (). (19) Its square root is called the standard deviation σ X, which is easier to interpret because it has the same unit as X. The epected values can also be used to compute the variance: σ 2 X = E{X2 } (E{X}) 2. Compute the variance σ 2 X and the standard deviation σ X of the random variable X describing the result of rolling a fair dice. Given the random variables X, Y R, with Y = ax + b (affine function) and the scalar parameters a and b, compute the epected value and the variance of Y. 2.1.1 Multiple Discrete Random Variables In many situations a single eperiment has more than one random variable of interest. For eample in a medical diagnosis contet, the results of several tests (random variables) may be significant. Then, all random variables are associated with the same sample space (here: set of possible diagnoses), the same eperiment (here: medical diagnosis process)and the same probability law, namely the joint PDF of X and Y. It is defined as p X,Y (, y) = P ({X = }, {Y = y}) (20) for all pairs of numerical values (, y) of X and Y. The joint PDF provides the probability of any event that can be epressed in terms of the random variables X and Y. If the event A is the set of all pairs (, y) with a certain property, then P ((X, Y ) A) = (,y) A p X,Y (, y). Consider rolling the fair dice twice again. Let the random variables X and Y describe the outcome of the first and second roll respectively. 1. Compute the joint PDF p X,Y (, y) of the random variables X and Y. 2. What is the chance that a 6 occured once? 5
The marginal PDF can be derived from the joint PDF as follows p X () = y p X,Y (, y) (21) and accordingly for p Y (y). Compute the marginal PDFs p X () and p Y (y) from the joint PDF p X,Y (, y) you just computed. For functions of multiple random variables the same concept holds as for single random variables. Let g(x, Y ) be a function of the random variables X and Y, which defines some other random variable Z = g(x, Y ). Then, the PDF of Z follows as p Z (z) = p X,Y (, y) (22) ((,y) g(,y)=z) and its epected value is E{g(X, Y )} =,y g(, y)p X,Y (, y). (23) 2.1.2 Conditionals and Conditional PDF The concept of conditional probability introduced in section?? also holds for random variables. Consider a single eperiment with the random variables X and Y associated to it. The conditional PDF of the random variable X, given the eperimental knowledge that the random variable Y takes the value y (with p Y (y) > 0) is defined as p X Y ( y) = P ({X = } {Y = y}) = P ({X = }, {Y = y}) P ({Y = y}) It is useful to compute the joint PDF sequentially with p X,Y (, y) = p Y (y)p X Y ( y). = p X,Y (, y). (24) p y (y) Compute the conditional PDF p X Y ( y) for the random variables X and Y describing the result of the first, respectively the second roll of a fair dice. Assume that the second roll of the fair dice was a 4 (y = 4). Use the joint PDF of the random variables X and Y from task (1) in section?? and the related marginal PDFs. The conditional epectation of X given a value y of Y is defined by E{X Y = y} = p X Y ( y). (25) 2.1.3 Independence Two random variables X and Y are independent, if the following holds p X,Y (, y) = p X ()p Y (y),, y. (26) Then also E{XY } = E{X}E{Y } and E{g(X)h(Y )} = E{g(X)}E{h(Y )} for any function g and h. Interesting to note is that the variance of the sum of two independent random variables X and Y is the sum of the variance of each random variable: σ 2 X+Y = σ2 X + σ2 Y. Check whether the random variables X and Y each describing the result of rolling a fair dice are independent. 6
2.2 Continuous Random Variables After a long day in the casino, Ma decides to go home. He hops on his bike and start riding home. At one of his favourite spots he stops to determine the correct GPS position p = [ y ]. He s only interested in two dimensions, namely longitude and latitude. Unfortunately he notices, that the signal is a bit noisy. Assume that the noise is additive, i.i.d. and has zero mean. ( ) + n pos noisy =. y + n y A random variable X which can take an infinite number of values is a continuous random variable, e.g. the noise of the - value of the GPS position. The concepts introduced for discrete random variables also eist for continuous random variables: The continuous random variable can be described in terms of a nonnegative probability density function. The PDF in the continuous case is defined as P (X B) = p X () d (27) for every subset B Ω. For eample if B = [a, b] with a, b, X R, then P (a X b) = b a p X()d. This probability can be interpreted as the area below the graph of the PDF. A PDF of a continuous random vriable also has to satisfy the aioms of a probability law: The probability P (X B) is nonnegative, as p () 0 for every. The probability of an event containing the whole sample space is normalized to 1: B p X () d = P ( < X < ) = 1. (28) To get more accurate information on his position, Ma decides to observe the data from the GPS receiver every 5 seconds and to compute the epected position from the recorded data. Please start MATLAB now and open statistics.m. Load the dataset position.mat which is provided on the website and plot the position data. Epected Value and Variance For any function g(x) it holds the epected value rule If g(x) = X, then the epected value is E{g(X)} = E{X} = The variance is defined as σ 2 X = E{(X E{X})2 }, in analogy with the discrete case. g()p X () d. (29) p X () d. (30) Compute the epectation value of Ma s position. Plot it in the same figure as above. (Hint: The data provides only a finite number of outcomes, so only finitely many values and y of the continuous random variables X and Y are nonzero. The outcomes are equally likely. Therefore you can interpret the epectation value as the mean of the data.) Then compute the variance and the standard deviation of the measurement noise of the - and y-data. Useful MATLAB commands are var and std. Normal Random Variables A normal random variable is described by the following PDF p X () = 1 ( µ)2 e 2σ 2 (31) 2πσ 2 also called Gauss distribution, where µ = E{X} and σ 2 = var(x). Normality is preserved by a linear transformation (Y = ax + b). The epected value and the variance of Y can be calculated to be E{Y } = aµ + b and var(y ) = a 2 σ 2. Given a random variable X with a normal propability density p X () = 1 2πσ 2 ep ( ( µ) 2 2σ 2 ) variance of X, please write down the PDF p Z (z) of the random variable Z = 5X + 3, using µ and σ. with µ the mean and σ 2 the 7
2.2.1 Multiple Continuous Random Variables The intuitive interpretation as well as the main properties of joint, marginal and conditional PDF parallel the discrete case. Two jointly continuous random variables X and Y, that are associated to a single eperiment, can be described by a joint PDF p X,Y. It has to be nonnegative and satisfy P ((X, Y ) B) = p X,Y (, y) d dy (32) (,y) B for every subset B of a two-dimensional space. Let B be the entire two-dimensional space, we obtain the normalization property The marginal PDF can be computed from the joint PDF as follows and analogously for p Y (y). p X () = p X,Y (, y) d dy = 1. (33) p X,Y (, y) dy (34) Epected Value, Variance and Standard Deviation The epectation rule defined for continuous variables also holds for multiple random variables X and Y and some function g(x, Y ). E{g(X, Y )} = g(, y)p X,Y (, y) d dy (35) Given a linear function g(x, Y ) = ax + by the ecpected value results in E{aX + by } = ae{x} + be{y }. Conditioning For any fied y (event Y = y) with p Y (y) > 0, the conditional PDF of X is defined by p X Y ( y) = p X,Y (,y) p Y (y). The probability of (X A) given Y = y can be calculated as P (X A Y = y) = p X Y ( y)d. (36) The conditional version of the epected value rule E{g(X) Y = y} = g()p X Y ( y) d remains valid. Independence In full analogy with discrete random variables, the continuous random variables X and Y are independent, if p X,Y (, y) = p X ()p Y (y), for all, y. If this holds, also the random variables g(x) and h(y ) are independent, for any functions g and h. For the epected value it holds that E{XY } = E{X}E{Y } and, more generally, E{g(X)h(Y )} = E{g(X)}E{h(Y )}. And for the variance of the sum of X and Y we have σx+y 2 = σ2 X + σ2 Y. 2.3 Covariance and Correlation Let X and Y be two continuous random variables with joint PDF f X,Y (, y). In addition to the variances, one is also interested in the so-called covariance σ(x, Y ) between X and Y. It is a measure of how much the two random variables change together. The covariance is defined as the epectation of (X µ X )(Y µ Y ) σ(x, Y ) = E{(X µ X )(Y µ Y )} = A ( µ X )(y µ Y )p X,Y (, y) d dy. (37) Compute the covariance of the scalar valued variable Y = ax + b with constants a R and b R and the random variable X R with mean E{X} = µ X. The physical unit of the covariance is the product of the units of the two random variables. To obtain a unitless quantity, one often divides the covariance by the two standard deviations, which yields the correlation ρ(x, Y ) between X and Y : ρ(x, Y ) = σ(x, Y ) σ X σ Y. (38) The correlation is a measure of linear relation between the random variables. One can show that, if two random variables are correlated, the correlation is always in the interval [-1, 1]. If the correlation is zero, one says that the two variables are uncorrelated. If two variables are statistically independent, then they are always uncorrelated, but the converse is not true in general. 8
Before Ma finally goes home he would like to know, how much the noise n and n y changes together. Please help him and compute the covariance and the correlation of the noise (random variables X and Y ). Is the noise correlated? Give the covariance matri ( ) σ 2 cov(x, Y ) = X σ(x, Y ) =... σ(x, Y ) σ 2 Y This matri can be used to deform a unit circle to visualize the spread of the distribution of the random variables and the relation between them. As you have learned in the last tutorial, such an ellipse is called confidence ellipsoid. Use the covariance matri to plot a confidence ellipsoid for the random variables X and Y describing the noise n and n y, respectively. Have a look at the provided MATLAB Script and fill in the missing parts. 2.4 Cumulative Distribution function We learned about discrete and continous random variables, which can be described by a PDF. The interpretation of the PDF in the discrete and continuous case is slightly different. Therefore it is desirable to have a single mathematical concept which applies to both types of random variables. This concept is called the cumulative distribution function F X, in short CDF, which provides the probabilty P (X ) for all k p X(k) X : discrete F X = P (X ) = (39) p X(t) d X : continuous. One can say the CDF accumulates probability up to the value of. p X (2) PDF p X () 1 CDF F X () p X (2) 0 1 2 3 4 0 1 2 3 4 Figure 1: CDF of a discrete random variable. 1 b a PDF p X () 1 CDF F X () Area = F X (c) a c b a c b The following properties hold for a CDF Figure 2: CDF of a continuous random variable. F X is monotonically increasing: if y, then F X () F Y (y). F X () tends to 0 for and to 1 for. IfX is discrete, then for all integers k : F X (k) = IfX is continuous: F X () = p X (t) dt p X () = df X d (). k p X (i) i= p X (k) = P (X k) P (X k 1) = F X (k) F X (k 1). 9
To practice what you ve just learned, please do the following two tasks: Give the PDF of the discrete random variable shown in figure?? and compute P (X 3). Assume that p X (1) = 0.25, p X (2) = 0.5, p X (3) = 0.1 and p X (4) = 0.15. Compute P (X c) with X being the continuous random variable shown in figure??. Joint CDF As well as for single random variables, the mathematical concept of a cumulative distribution function applies to multiple random variables. For the continuous random variables X and Y the joint CDF is defined as 2.5 Additional Tasks for Tutorial 2 F X,Y (, y) = P (X, Y y) = y f X,Y (s, t) ds dt. (40) In the following you can find some tasks to practice what you ve learned in the tutorial. Have fun doing them! 1. Regard a volleyball team and their 3 opponent teams. A i is the event of getting team i as opponent. The probabilities are given as P (A 1 ) = 0.3, P (A 2 ) = 0.4, P (A 3 ) = 0.3. (41) The event B is the event of winning the game, and P (B A 1 ) = 0.1, P (B A 2 ) = 0.6, P (B A 3 ) = 0.3. (42) Given the knowledge that your team wins. What is the probability P (A 2 B) that the opponent was team 2? 2. Give the PDF of a random variable X which is uniformly distributed between a and b, and zero otherwise. Compute the epected value of X. 3. A census was conducted at a university. All students were asked how many siblings they had. Regard the amount of siblings as random variable X. Amount of siblings 0 1 2 3 4 5 6 7 Amount of students 24 33 29 10 2 0 0 1 (a) Plot the PDF p X in MATLAB. (b) Compute the mean of the random variable X and plot into the same plot. (c) Compute the variance of the random variable X. 10