Statistical foundations of machine learning

Size: px
Start display at page:

Download "Statistical foundations of machine learning"

Transcription

1 Statistical foundations of machine learning INFO-F-422 Gianluca Bontempi Machine Learning Group Computer Science Department mlg.ulb.ac.be

2 Random experiment We define as random experiment any action or process which generates results or observations which cannot be predicted with certainty. Examples: tossing a coin, rolling dice, measuring the commute time to go back home

3 Probability space A random experiment is characterized by a sample space Ω that is the set of all the possible outcomes ω of the experiment. This space can be either finite or infinite. For example, in the die experiment, Ω = {ω 1,ω 2,...,ω 6 }, in the commute time example Ω = {ω LOW,ω MEDIUM,ω HIGH } The elements of the set Ω are called experimental outcomes. The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be heads or tails. A subset of experimental outcomes is called event. An example of event is the set of even values E = {ω 2,ω 4,ω 6 } or the set of non high times E = {ω LOW,ω MEDIUM }. A single execution of a random experiment is called a trial. At each trial we observe one outcome ω i. We say that an event occurred during this trial if it contains the element ω i. For example, in the die experiment, if we observe the outcome ω 4 the event even took place.

4 Events and set theory Since events E are subsets, we can apply to them the terminology of the set theory: E c = {ω Ω : ω / E} denotes the complement of E. E 1 E 2 = {ω Ω : ω E 1 OR ω E 2 } refers to the event that occurs when E 1 or E 2 or both occur. E 1 E 2 = {ω Ω : ω E 1 AND ω E 2 } refers to the event that occurs when both E 1 and E 2 occur. two events E 1 and E 2 are mutually exclusive or disjoints if E 1 E 2 = that is each time that E 1 occurs, E 2 does not occur. a partition of Ω is a set of disjoint sets E j, j = 1,...,n such that n j=1e j = Ω

5 Class of events The class {E} of events is not an arbitrary collection of subsets of Ω. We want that if E 1 and E 2 are events also that the intersection E 1 E 2 and the union E 1 E 2 be events. We do so because we will want to know not only the probabilities of various events, but also the probabilities of their unions and intersections. In the following we will consider only events that, in mathematical terms, form a Borel field.

6 Combined experiments Note that the same sample space is not necessarily univariate. The most interesting use of probability concerns however combined random experiments whose sample space Ω = Ω 1 Ω 2...Ω n is the Cartesian product of several spaces Ω i, i = 1,...,n. For instance if we want to study the probabilistic dependence between the height and the weight of a child we have to define a joint sample space Ω = {(w,h) : w Ω w,h Ω h } made of all pairs (w,h) where Ω w is the sample space of the random experiment describing the weight and Ω h is the sample space of the random experiment describing the height.

7 Axiomatic definition of probability Let Ω the certain event that occurs in every trial and let E 1 +E 2 design the event that occurs when E 1 or E 2 or both occur. The axiomatic approach to probability consists in assigning to each event E a number Prob{E} which is called the probability of the event E. This number is so chosen as to satisfy the following three conditions: 1 Prob{E} 0 for any E. 2 Prob{Ω} = 1 3 Prob{E 1 +E 2 } = Prob{E 1 }+Prob{E 2 } if Prob{E 1 E 2 } = 0 that is E 1 and E 2 are mutually exclusive (or disjoint). These conditions are the axioms of the theory of probability (Kolmogoroff,1933). It follows that Prob{E} = ω E Prob{ω}

8 Axiomatic definition of probability All probabilistic conclusions are based directly or indirectly on the axioms and only the axioms. But how to define these probability numbers?

9 Symmetrical definition of probability Consider a random experiment where the sample space is made of N symmetric outcomes, i.e. we have no reason to expect or prefer one over the other the number of outcomes which are favorable to the event E (i.e. if they occur then the event E takes place) is N E. then according to the principle of indifference (a term popularized by J.M. Keynes in 1921) we have Prob{E} = N E N Note that this number is determined without any experimentation and is based on symmetrical assumptions. But what happen if symmetry does not hold?

10 Frequentist definition of probability Let us consider a random experiment and an event E. Let us repeat the experiment N times and let us compute the number of times that the event E occurs. The quantity NE N is the relative frequency of E. It can be empirically observed that the frequency converges to a fixed value for increasing N when the experiment is run a large number of times under exactly the same situations and in a way so that the repetitions of the experiment are independent of each other. This observation led Von Mises to use the notion of frequency as a foundation for the notion of probability.

11 Frequentist definition of probability Definition It is based on the following definition (Von Mises): The probability Prob{E} of an event E is the limit N E Prob{E} = lim N N where N is the number of observations (trials) and N E is the number of times that E occurred. This definition appears reasonable and it is compatible with the axioms. According to this approach, the probability is not a property of a specific observation, but rather it is a property of the entire set of observations. In practice, in any physical experience N is finite and the limit has to be accepted as an hypothesis, not as a number that can be determined experimentally.

12 Weak law of Large Numbers A link between the axiomatic and the frequentist approach is provided by the weak law of Large Numbers. Theorem (Bernoulli) For any ǫ > 0, { } N E Prob N p ǫ 1 as N ratio N E /N is close to p in the sense that, for any ǫ > 0, the probability that N E /N p ǫ tends to 1 as N. Law about the long-run behavior not about a single (or the next) experiment In other terms the set of outcomes for which the sequence N E /N does not converge to p is negligibly small. The weak law essentially states that for any nonzero margin specified, no matter how small, with a sufficiently large sample there will be a very high probability that the frequency will be close to the probability, that is, within the margin.

13 Law of Large Numbers and simulation The law of large numbers is also the mathematical basis for the widespread application of computer simulations to solve practical probability problems. In simulation, the (unknown) probability of a given event in a chance experiment is estimated by the relative frequency of occurrence of the event in a large number of computer simulations of the experiment. The application of simulations is based on the elementary principles of probability Stochastic simulation (also known as Montecarlo) is a powerful tool with which extremely complicated probability problems can be solved.

14 Gambler s fallacy Note that according to the law of large numbers In fact, N E /N p for N but NOT that N E Np for N Prob{N E = Np} 1 2πNp(1 p) 0, as N In a fair coin-tossing game the law of large numbers does not imply that the absolute difference between the number of heads and tails should oscillate close to zero. On the contrary it can be shown that the absolute difference keeps growing proportionally to N (and then slower than the number of tosses). The illusion that after a long number of "heads", it is more probable to have a "tail" is also known as the gambler s fallacy. If gamblers were true, this would mean that coins have memory!

15 Probability and frequency Relative frequency Absolute difference (no. heads and tails) e+00 2 e+05 4 e+05 6 e+05 0 e+00 2 e+05 4 e+05 6 e+05 Number of trials Number of trials

16 Some probabilistic notions Definition (Independent events) Two events E 1 and E 2 are independent if and we write E 1 E 2. Examples: Prob{E 1 E 2 } = Prob{E 1 } Prob{E 2 } event E 1 : your professor is Italian, event E 2 : bad weather in Brussels event E 1 : commute time 10 minutes, event E 2 : bad weather in Buenos Aires

17 Some probabilistic notions Definition (Conditional probability) If Prob{E 1 } > 0 then the conditional probability of E 2 given E 1 is Examples: Prob{E 2 E 1 } = Prob{E 1 E 2 } Prob{E 1 } event E 2 : bad weather in Brussels; event E 1 : commute time smaller than average,

18 Exercices 1 Let E 1 and E 2 two disjoint events with positive probability. Can they be independent? 2 Suppose that a fair die is rolled and that the number x appears. Let E 1 be the event that the number x is even, E 2 be the event that the number x is greater than or equal to 3, E 3 be the event that the number x is a 4,5 or 6. Are the events E 1 and E 2 independent? Are the events E 1 and E 3 independent?

19 Warnings For any fixed E 1, the quantity Prob{ E 1 } satisfies the axioms of probability. For instance if E 2, E 3 and E 4 are disjoint events we have that Prob{E 2 E 3 E 4 E 1 } = Prob{E 2 E 1 }+Prob{E 3 E 1 }+Prob{E 4 E 1 } However this does NOT generally hold for Prob{E 1 }, that is when we fix the term E 1 on the left of the conditional bar. For two disjoint events E 2 and E 3, in general Prob{E 1 E 2 E 3 } Prob{E 1 E 2 }+Prob{E 1 E 3 } it is generally NOT the case that Prob{E 2 E 1 } = Prob{E 1 E 2 }.

20 Warnings Examples: The following properties hold : If E 1 E 2 Prob{E 2 E 1 } = Prob{E 1} Prob{E 1 } = 1 (E 1 E 2 ) Prob{E 1 E 2 } = Prob{E 1} Prob{E 2 } Prob{E 1} event E 1 : your professor is Italian, event E 2 : your professor is European event E 1 : commute time 10 minutes, event E 2 : commute time 60 minutes

21 Bayes theorem Let us consider a set of mutually exclusive and exhaustive events E 1, E 2,..., E k, i.e. they form a partition of Ω. Theorem (Law of total probability) Let Prob{E i }, i = 1,...,k denote the probabilities of the ith event E i and Prob{E E i }, i = 1,...,k the conditional probability of a generic event E given that E i has occurred. It can be shown that Prob{E} = k Prob{E E i } Prob{E i } = i=1 k Prob{E E i } Example: how much time will it take tomorrow to go back home by car given the weather forecast? event E: tomorrow commute car time smaller than average event E 1 : nice weather in Brussels, event E 2 : average weather in Brussels, event E 3 : bad (as usual) weather in Brussels i=1

22 Bayes theorem Theorem (Bayes theorem) The conditional ( inverse ) probability of any E i, i = 1,...,k given that E has occurred is given by Prob{E i E} = Prob{E E i } Prob{E i } k j=1 Prob{E E j}prob{e j } = Prob{E E i} Prob{E} i = 1,...,k Example: how probable was the bad time last Wednesday when it took a long going back home by car? event E: commute car time longer than average event E 1 : nice weather in Brussels, event E 2 : bad weather in Brussels

23 Transitivity in logics and probability Let us consider three boolean spaces Ω i = {TRUE,FALSE} and three events E 1,E 2,E 3. From logics we know that if E 1 E 2 and E 2 E 3 then E 1 E 3. Does it hold in probability too? In probabilistic terms we can rewrite the logical implications as Prob{E 2 = T E 1 = T} = 1 and Prob{E 3 = T E 2 = T} = 1. Prob{E 3 = T E 1 = T} = = Prob{E 3 = T E 2 = i,e 1 = T}Prob{E 2 = i,e 1 = T} = i = Prob{E 3 = T E 2 = F,E 1 = T}Prob{E 2 = F E 1 = T} + }{{} 0 + Prob{E 3 = T E 2 = T,E 1 = T}Prob{E 2 = T E 1 = T} = 1

24 Inverse modus ponens in logics and probability According to logic, if E 1 E 2 then E 2 E 1 Does it hold in probability too? In probabilistic terms we can rewrite the logical implications as Prob{E 2 = T E 1 = T} = 1 It follows Prob{E 1 = F E 2 = F} = 1 Prob{E 1 = T E 2 = F} = Prob{E 2 = F E 1 = T} Prob{E 1 = T} }{{} 0 = 1 = 1 Prob{E 2 = F} In other terms deductive logic rules can be seen as limiting cases of probabilistic reasoning.

25 Medical study Let us consider a medical study about the relationship between the outcome of a medical test and the presence of a disease. We model this study as the combination of two random experiments: 1 the random experiment which models the state of the patient. Its sample space is Ω s = {H,S} where H and S stand for healthy and sick patient, respectively. 2 the random experiment which models the outcome of the medical test. Its sample space is Ω o = {+, } where + and stand for positive and negative outcome of the test, respectively. Suppose that out of 1000 patients, E s = S E s = H E o = E o = E s = S E s = H E o = E o = What is the probability of having a positive (negative) test outcome when the patient is sick (healthy)? What is the probability of being in front of a sick (healthy) patient when a positive (negative) outcome is obtained?

26 Medical study (II) From the definition of conditional probability we derive Prob{E o = + E s = S} = Prob{Eo = +,E s = S} Prob{E s = S} Prob{E o = E s = H} = Prob{Eo =,E s = H} Prob{E s = H} = = = =.9 According to these figures, the test appears to be accurate. Do we have to expect a high probability of being sick when the test is positive? The answer is NO as shown by Prob{E s = S E o = +} = Prob{Eo = +,E s = S} Prob{E o = +} = This example shows that sometimes humans tend to confound Prob{E s E o } with Prob{E o E s } and that the most intuitive response is not always the right one.

27 Array of joint/marginal probabilities Let us consider the combination of two random experiments whose sample spaces are Ω A = {A 1,,A n } and Ω B = {B 1,,B m } respectively. Assume that for each pairs of events (A i,b j ), i = 1,...,n, j = 1,...,m we know the joint probability value Prob{A i,b j }. B 1 B 2 Value B m Marginal A 1 Prob{A 1,B 1 } Prob{A 1,B 2 } Prob{A 1,B m} Prob{A 1 } A 2 Prob{A 2,B 1 } Prob{A 2,B 2 } Prob{A 1,B m} Prob{A 2 }... A n Prob{A n,b 1 } Prob{A n,b 2 } Prob{A n,b m} Prob{A n} Marginal Prob{B 1 } Prob{B 2 } Prob{B m} Sum=1... The joint probability array contains all the necessary information for computing all marginal and conditional probabilities Try to fill the table for the dependent and independent case.

28 Dependent/independent: example Let us model the commute time to go back home for an ULB student living in St. Gilles as a random experiment. Suppose that its sample space is Ω={LOW, MEDIUM, HIGH}. Consider also an (extremely:-) random experiment representing the weather in Brussels, whose sample space is Ω={G=GOOD, B=BAD}. Suppose that the array of joint probabilities is G (in Bxl) B (in Bxl) Marginal LOW Prob{LOW} = 0.2 MEDIUM Prob{MEDIUM} = 0.5 HIGH Prob{HIGH} = 0.3 Prob{G} = 0.3 Prob{B} = 0.7 Sum=1 Is the commute time dependent on the weather in Bxl? G (in Rome) B (in Rome) Marginal LOW Prob{LOW} = 0.2 MEDIUM Prob{MEDIUM} = 0.5 HIGH Prob{HIGH} = 0.3 Prob{G} = 0.9 Prob{B} = 0.1 Sum=1 Is the commute time dependent on the weather in Rome?

29 Dependent/independent: example (II) If Brussels weather is good LOW MEDIUM HIGH Prob{ G} 0.15/0.3= /0.3= /0.3=0.16 Else if Brussels weather is bad LOW MEDIUM HIGH Prob{ B} 0.05/0.7= /0.7= /0.7=0.35 The distribution of the commute time changes according to the value of the Brussels weather.

30 Dependent/independent: example (III) If Rome s weather is good LOW MEDIUM HIGH Prob{ G} 0.18/0.9= /0.9= /0.9=0.3 Else if If Rome s weather is bad LOW MEDIUM HIGH Prob{ B} 0.02/0.1= /0.1= /0.1=0.3 The distribution of the commute time does NOT change according to the value of Rome s weather.

31 Marginal/conditional: example Consider a probabilistic model of the day s weather based on three random descriptors (or features) where 1 the first represents the sky condition and takes value in the finite set {CLEAR, CLOUDY}. 2 the second represents the barometer trend and takes value in the finite set {RISING,FALLING}, 3 the third represents the humidity in the afternoon and takes value in {DRY,WET}.

32 Marginal/conditional: example (II) Let the joint distribution be given by the table E 1 E 2 E 3 P(E 1,E 2,E 3 ) CLEAR RISING DRY 0.4 CLEAR RISING WET 0.07 CLEAR FALLING DRY 0.08 CLEAR FALLING WET 0.10 CLOUDY RISING DRY 0.09 CLOUDY RISING WET 0.11 CLOUDY FALLING DRY 0.03 CLOUDY FALLING WET 0.12 From the joint distribution we can calculate the marginal probabilities P(CLEAR,RISING) = 0.47 and P(CLOUDY) = 0.35 and the conditional value P(DRY CLEAR,RISING) = = P(DRY,CLEAR,RISING) P(CLEAR, RISING) =

33 Random variables Machine learning and statistics is concerned with data. What is then the link between the notion of random experiments and data? The answer is provided by the concept of random variable. Consider a random experiment (Ω,{E}, Prob{ }). The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be heads or tails. However, we often want to represent outcomes as numbers. A random variable is a function that associates a unique numerical value with every outcome of an experiment. The value of the random variable will vary from trial to trial as the experiment is repeated. Suppose that we have a mapping rule Ω R such that we can associate to each experimental outcome ω a real value z(ω). We say that z is the value taken by the random variable z when the outcome of the random experiment is ω. Since there is a probability associated to each event E and we have a mapping from events to real values, a probability distribution can be associated to z.

34 Random variables

35 Random variables Definition Given a random experiment (Ω,{E}, Prob{ }), a random variable z is the result of a mapping that assigns a number z to every outcome ω. This mapping must satisfy the following two conditions but is otherwise arbitrary: the set {z z} is an event for every z. the probabilities Prob{z = } = 0 Prob{z = } = 0 Given a random variable z Z and a subset I Z we define the inverse mapping z 1 (I) = {ω Ω z(ω) I} where z 1 (I) {E} is an event and let Prob{z I} = Prob { z 1 (I) } = Prob{ω Ω z(ω) I}

36 Probabilistic interpretation of uncertainty This course will assume that the variability of measurements can be represented by the probability formalism. A random variable is a numerical quantity, linked to some experiment involving some degree of randomness, that takes its value from some set of possible real values. Example: the experiment might be the rolling of two six-sided dice and the r.v. z might be the sum of the two numbers showing in the dice. In this case the set of possible values are 2,...,12. In the example of the commute time, the random experiment is a compact (and approximate) way of modeling the disparate set of causes which led to variability in the value of z.

37 Probability function of a discrete r.v. The probability (mass) function of a discrete r.v. z is the combination of 1 the finite set Z of values that this r.v. can take (also called range or sample space) 2 the set of probabilities associated to each value of Z This means that we can attach to the random variable some specific mathematical function P(z) that gives for each z Z the probability that z assumes the value z P z (z) = Prob{z = z} This function must satify the two following conditions { P z (z) 0 z Z P z(z) = 1

38 Probability function of a discrete r.v.(ii) For a reduced number of possible values of z, the probability function can be presented in the form of a table. For example, if we plan to toss a fair coin twice, and the random variable z is the number of heads that eventually turn up, the probability function can be presented as follow Values of the random variable z Associated probabilities

39 Parametric probability function Suppose that 1 z is a discrete r.v. that takes its value in Z = {1, 2, 3}. 2 the probability function of z is P z (z) = θ 2z θ 2 +θ 4 +θ 6 where θ is some fixed non zero real number. Whatever the value of θ, P z (z) 0 for z = 1, 2, 3 and P z (1)+P z (2)+P z (3) = 1. Therefore z is a well-defined random variable, even if the value of θ is unknown. We call θ a parameter, that is some constant, usually unknown involved in a probability function. The collection of all probability distributions for different values of the parameter is called a family of probability distributions.

40 Expected value of a discrete r.v. Expected value of a discrete random variable z is defined as E[z] = µ = z Z zp z (z) the expected value (introduced first by Huygens in the seventeenth century) is a weighted average of the possible values that z could assume, where each value is weighted with the probability that z would assume the value in question. the expected value must not be confused with the "most probable value". The expected value is not necessarily a value that belongs to Z (i.e. the expected value of a die roll is 3.5) In English mean is used as a synonymous of "expected value". But the word average is NOT a synonymous of "expected value".

41 Substitution rule For any function g of the random variable z E[g(z)] = z Zg(z)P z (z) if z Z g(z) P z(z) < Note that in general E[g(z)] ge[z]. an exception if the linear function g(z) = az +b E[az+b] = ae[z]+b

42 Variance of a discrete r.v. Variance of a discrete random variable z is defined as Var[z] = σ 2 = E[(z E[z]) 2 ] = z Z(z E[z]) 2 P z (z) The variance is a measure of the dispersion of the probability function of the random variable around its mean. Note that since (z µ) 2 = z 2 2µz+µ 2 the following identity holds E[(z E[z]) 2 ] E[z 2 ] (E[z]) 2 = E[z 2 ] µ 2

43 Examples of probability functions P PP : :10 Two discrete r.v. probability functions having the same mean but different variance.

44 Std. deviation and moments of a discrete r.v. Standard deviation of a discrete random variable z is defined as the positive square root of the variance. Std[z] = Var[z] = σ Moment: for any positive integer r, the rth moment of the probability function is µ r = E[z r ] = z z r P z (z) Skewness of a discrete random variable z is defined as γ = E[(z µ)3 ] σ 3 Distributions with positive skewness have long tails to the right, and distributions with negative skewness have long tails to the left.

45 Joint probability Consider a probabilistic model described by n discrete random variables. A fully specified probabilistic model gives the joint probability function for every combination of the values of the n r.v.s. The model is specified by the values of the probabilities Prob{z 1 = z 1,z 2 = z 2,...,z n = z n } = P(z 1,z 2,...,z n ) for every possible assignment of values z 1,...,z n to the variables.

46 Independent variables Let x and y be two random variables. The two variables x and y are defined to be statistically independent if the joint probability Prob{x = x,y = y} = Prob{x = x} Prob{y = y} If two variables x and y are independent, then the transformed r.v. g(x) and h(y), where g and h are two given functions, are also independent. In qualitative terms, this means that we do not expect that the outcome of one variable will affect the other. Examples: think to two outcomes of a roulette wheel or to two coins tossed simultaneously.

47 Continuous random variable Continuous random variables take their value in some continuous range of values. Consider a real random variable z whose range is the set of real numbers. The following quantities can be defined: Definition The (cumulative) distribution function of z is the function F z (z) = Prob{z z} Definition The density function of a real random variable z is the derivative of the distribution function: p z (z) = df z(z) dz

48 Continuous random variable Any individual value has probability zero for a continuous random variable Probabilities of continuous r.v. are not allocated to specific values but rather to interval of values. Specifically Prob{a < z < b} = b a p z (z)dz, Z p z (z)dz = 1

49 Mean, variance,... of a continuous r.v. Consider a continuous scalar r.v. having range (l,h) and density function p(z). We can define Expectation (mean): Variance: σ 2 = µ = h h Other quantities of interest are the moments : µ r = E[z r ] = l h l l zp(z)dz (z µ) 2 p(z)dz z r p(z)dz The moment of order r = 1 is the mean of z.

50 Uniform distribution A random variable z is said to be uniformly distributed on the interval (a,b) (also z U(a,b)) if its probability density function is given by p(z) = { 1 b a if a < z < b 0, otherwise p(z) 1 b-a TP: compute the variance of U(a,b). a b z

51 Normal distribution: the scalar case A continuous scalar random variable x is said to be normally distributed with parameters µ and σ 2 (also x N(µ,σ 2 )) if its probability density function is given by p x (x) = 1 e (x µ)2 2σ 2 2πσ The mean of x is µ; the variance of x is σ 2. The coefficient in front of the exponential ensures that p(x)dx = 1. The probability that an observation x from a normal r.v. is within 2 standard deviations from the mean is If µ = 0 and σ 2 = 1 the distribution is defined standard normal. We will denote its distribution function F z (z) = Φ(z). Given a normal r.v. x N(µ,σ 2 ), the r.v. z = (x µ)/σ has a standard normal distribution.

52 Standard distribution

53 Important relations x N(µ,σ 2 ) Prob{µ σ x µ+σ} Prob{µ 1.282σ x µ+1.282σ} 0.8 Prob{µ 1.645σ x µ+1.645σ} 0.9 Prob{µ 1.96σ x µ+1.96σ} 0.95 Prob{µ 2σ x µ+2σ} Prob{µ 2.57σ x µ+2.57σ} 0.99 Prob{µ 3σ x µ+3σ} Test yourself these relations by random sampling and simulation using R!

54 Linear combinations The expectation value of a linear combination of r.v. s is simply the linear combination of their respective expectation values E[ax+by] = ae[x]+be[y] i.e., expectation is a linear statistic. Since the variance is not a linear statistic, we have Var[ax+by] = a 2 Var[x]+b 2 Var[y]+2ab(E[xy] E[x]E[y]) = where = a 2 Var[x]+b 2 Var[y]+2abCov[x,y] Cov[x,y] = E [(x E[x])(y E[y])] = E[xy] E[x]E[y] is called covariance. Covariance measure whether the two variables vary simultaneously in the same way around their average.

55 Covariance example y = 3 y = 10 y = 20 Marginal x = P(x = 10) = 0.25 x = P(x = 20) = 0.4 x = P(x = 30) = 0.35 P(y = 3) = 0.3 P(y = 10) = 0.4 P(y = 20) = 0.3 Sum=1 E[x] = 21, Var[x] = 59, E[y] = 10.9, Var[y] = 43.89, xy P(xy = xy) E[xy] = 211 Cov[x,y] = = 17.9, ρ(x,y) = 0.35

56 Correlation The correlation coefficient is ρ(x,y) = Cov[x, y] Var[x]Var[y] It is easily shown that 1 ρ(x, y) 1. Two r.v. are called uncorrelated if E[xy] = E[x]E[y] If x and y are two independent random variables then Cov[x,y] = 0 or equivalently E[xy] = E[x]E[y]. If x and y are two independent random variables then also Cov[g(x),h(y)] = 0 or equivalently E[g(x)h(y)] = E[g(x)]E[h(y)]. Independence Uncorrelation but not viceversa for a generic distribution. Independence Uncorrelation if x and y are jointly gaussian.

57 Correlation and causation The existence of a correlation different from zero does not necessarily mean that there is a causality relationship. Think to these examples Amount of Cokes drunk per day and the sport performance Sleeping with shoes on and headache The amount of firemen and the gravity of the disaster Taking an expensive drug and cancer risk. According to Tufte, Empirically observed covariation is a necessary but not sufficient condition four causality

58 Linear combination of independent vars If the random variables x and y are independent, then Var[x+y] = Var[x]+Var[y] and Var[ax+by] = a 2 Var[x]+b 2 Var[y] In general, if the random variables x 1,x 2,...,x k are independent, then [ k ] k Var c i x i = ci 2 σi 2 i=1 i=1

59 TP 1 Let x and y two discrete independent r.v. such that and P x ( 1) = 0.1, P x (0) = 0.8, P x (1) = 0.1 P y (1) = 0.1, P y (2) = 0.8, P y (3) = 0.1 If z = x+y show that E[z] = E[x]+E[y] 2 Let x be a discrete r.v. which assumes { 1, 0, 1} with probability 1/3 and y = x 2. 1 Let z = x+y. Show that E[z] = E[x]+E[y] 2 Demonstrate that x and y are uncorrelated but dependent random variables.

60 The sum of i.i.d. random variables Suppose that z 1,z 2,...,z N are i.i.d. (identically and independently distributed) random variables, discrete or continuous, each having a probability distribution with mean µ and variance σ 2. Let us consider the two derived r.v., that is the sum and the average S N = z 1 + z 2 + +z N z = z 1 + z 2 + +z N N The following relations hold E[S N ] = Nµ, Var[S N ] = Nσ 2 E[ z] = µ, See the R script sum_rv.r. Var[ z] = σ2 N

61 Normal distribution: the multivariate case Let z be a random vector (n 1). The vector is said to be normally distributed with parameters µ (n 1) and Σ (n n) (also z N(µ,Σ)) if its probability density function is given by p z (z) = It follows that 1 ( 2π) n det(σ) exp { 1 2 (z µ)t Σ 1 (z µ) the mean E[z] = µ = [µ 1,...,µ n ] T is a [n, 1]-dimensional vector, where µ i = E[z i ], i = 1,...,n, the [n, n] matrix Σ (n n) = E[(z µ)(z µ) T ] = } σ1 2 σ σ 1n σ 1n σ 2n... σn 2 is the covariance matrix where σ 2 i = Var[z i ] and σ ij = Cov[z i,z j ]. This matrix is squared and symmetric. It has n(n + 1)/2 parameters.

62 Normal multivariate distribution (II) The quantity = (z µ) T Σ 1 (z µ) which appears in the exponent of p z is called the Mahalanobis distance from z to µ. It can be shown that the surfaces of constant probability density are hyperellipsoids on which 2 is constant; their principal axes are given by the eigenvectors u i, i = 1,...,n of Σ which satisfy Σu i = λ i u i i = 1,...,n where λ i are the corresponding eigenvalues. the eigenvalues λ i give the variances along the principal directions.

63 Normal multivariate distribution (III) If the covariance matrix Σ is diagonal then the contours of constant density are hyperellipsoids with the principal directions aligned with the coordinate axes. the components of z are then statistically independent since the distribution of z can be written as the product of the distributions for each of the components separately in the form p z (z) = n p(z i ) the total number of independent parameters in the distribution is 2n. if σ i = σ for all i, the contours of constant density are hyperspheres. i=1

64 Bivariate normal distribution Consider a bivariate normal density whose mean is µ = [µ 1,µ 2 ] T and the covariance matrix is [ ] σ 2 Σ = 1 σ 12 σ 21 σ2 2 The correlation coefficient is ρ = σ 12 σ 1 σ 2 It can be shown that the general bivariate normal density has the form 1 p(z 1,z 2 ) = 2πσ 1 σ 2 1 ρ 2 [ [ (z1 ) 2 ( )( ) ( ) ]] 2 1 µ 1 z1 µ 1 z2 µ 2 z2 µ 2 exp 2(1 ρ 2 2ρ + ) σ 1 σ 1 σ 2 σ 2

65 Bivariate normal distribution Let Σ = [1.2919, ; , ] p(z 1,z 2 ) z z 1

66 Bivariate normal distribution (prj) z 2 u 1 u 2 λ 1 λ 2 z 1 See the R script s_gaussxyz.r.

67 Marginal and conditional distributions One of the important properties of the multivariate normal density is that all conditional and marginal probabilities are also normal. Using the relation p(z 2 z 1 ) = p(z 1,z 2 ) p(z 1 ) we find that p(z 2 z 1 ) is a normal distribution N(µ 2 1,σ2 1 2 ), where Note that µ 2 1 = µ 2 +ρ σ 2 σ 1 (z 1 µ 1 ) σ = σ2 2 (1 ρ2 ) µ 2 1 is a linear function of z 1 : if the correlation coefficient ρ is positive, the larger z 1, the larger µ 2 1. if there is no correlation between z 1 and z 2, we can ignore the value of z 1 to estimate µ 2.

68 rho= ; Var[z2]= 1.05 ; Var[z2 z1]= 0.19 z p(z1) Z Z z1 Z1 p(z2) p(z2 z1=0)

69 The central limit theorem Theorem Assume that z 1,z 2,...,z N are i.i.d. random variables, discrete or continuous, each having a probability distribution with finite mean µ and finite variance σ 2. As N, the standardized random variable ( z µ) N σ which is identical to (S N Nµ) Nσ converges in distribution to a r.v. having the standardized normal distribution N(0, 1). This result holds regardless of the common distribution of z i. This theorem justifies the importance of the normal distribution, since many r.v. of interest are either sums or averages. See R script central.r.

70 The chi-squared distribution For a N positive integer, a r.v. z has a χ 2 N distribution if z = x x 2 N where x 1,x 2,...,x N are i.i.d. random variables N(0, 1). The probability distribution is a gamma distribution with parameters ( 1 2 N, 1 2 ). E[z] = N and Var[z] = 2N. The distribution is called a chi-squared distribution with N degrees of freedom.

71 The chi-squared distribution (II) 0.1 χ 2 N density: N=10 1 χ 2 N cumulative distribution: N= R script chisq.r.

72 Student s t-distribution If x N(0, 1) and y χ 2 N are independent then the Student s t-distribution with N degrees of freedom is the distribution of the r.v. z = x y/n We denote this with z T N.

73 Student s t-distribution 0.4 Student density: N=10 1 Student cumulative distribution: N= R script s_stu.r.

74 Notation In order to clarify the distinction between random variables and their values, we will use the boldface notation for denoting a random variable (e.g. z) and the normal face notation for the eventually observed value (e.g. z = 11). The notation P z (z) denotes the probability that the random variable z take the value z. The suffix indicates that the probability relates to the random variable z. This is necessary since we often discuss probabilities associated with several random variables simultaneously. Example: z could be the age of a student before asking and z = 22 could be his value after the observation.

75 Notation (II) In general terms, we will denote as the probability distribution of a random variable z any complete description of the probabilistic behavior of z. For example, if z is continuous, the density function p(z) or the distribution function could be examples of probability distribution. Given a probability distribution F z (z) the notation F {z 1,z 2,...,z N } means that the dataset D N = {z 1,z 2,...,z N } is a i.i.d. random sample observed from the probability distribution F z ( ).

Modèles stochastiques II

Modèles stochastiques II Modèles stochastiques II INFO 154 Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 1 http://ulbacbe/di Modéles stochastiques II p1/50 The basics of statistics Statistics starts ith

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Brief Review of Probability

Brief Review of Probability Brief Review of Probability Nuno Vasconcelos (Ken Kreutz-Delgado) ECE Department, UCSD Probability Probability theory is a mathematical language to deal with processes or experiments that are non-deterministic

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

BASICS OF PROBABILITY

BASICS OF PROBABILITY October 10, 2018 BASICS OF PROBABILITY Randomness, sample space and probability Probability is concerned with random experiments. That is, an experiment, the outcome of which cannot be predicted with certainty,

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example

More information

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear

More information

Lecture 2: Review of Probability

Lecture 2: Review of Probability Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

Some Concepts of Probability (Review) Volker Tresp Summer 2018

Some Concepts of Probability (Review) Volker Tresp Summer 2018 Some Concepts of Probability (Review) Volker Tresp Summer 2018 1 Definition There are different way to define what a probability stands for Mathematically, the most rigorous definition is based on Kolmogorov

More information

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets

More information

1 Random Variable: Topics

1 Random Variable: Topics Note: Handouts DO NOT replace the book. In most cases, they only provide a guideline on topics and an intuitive feel. 1 Random Variable: Topics Chap 2, 2.1-2.4 and Chap 3, 3.1-3.3 What is a random variable?

More information

Appendix A : Introduction to Probability and stochastic processes

Appendix A : Introduction to Probability and stochastic processes A-1 Mathematical methods in communication July 5th, 2009 Appendix A : Introduction to Probability and stochastic processes Lecturer: Haim Permuter Scribe: Shai Shapira and Uri Livnat The probability of

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Notes on Mathematics Groups

Notes on Mathematics Groups EPGY Singapore Quantum Mechanics: 2007 Notes on Mathematics Groups A group, G, is defined is a set of elements G and a binary operation on G; one of the elements of G has particularly special properties

More information

Review of probability. Nuno Vasconcelos UCSD

Review of probability. Nuno Vasconcelos UCSD Review of probability Nuno Vasconcelos UCSD robability probability is the language to deal with processes that are non-deterministic examples: if I flip a coin 00 times how many can I expect to see heads?

More information

3. Probability and Statistics

3. Probability and Statistics FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important

More information

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com 1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics, Appendix

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying

More information

CME 106: Review Probability theory

CME 106: Review Probability theory : Probability theory Sven Schmit April 3, 2015 1 Overview In the first half of the course, we covered topics from probability theory. The difference between statistics and probability theory is the following:

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Bivariate Distributions. Discrete Bivariate Distribution Example

Bivariate Distributions. Discrete Bivariate Distribution Example Spring 7 Geog C: Phaedon C. Kyriakidis Bivariate Distributions Definition: class of multivariate probability distributions describing joint variation of outcomes of two random variables (discrete or continuous),

More information

Tutorial for Lecture Course on Modelling and System Identification (MSI) Albert-Ludwigs-Universität Freiburg Winter Term

Tutorial for Lecture Course on Modelling and System Identification (MSI) Albert-Ludwigs-Universität Freiburg Winter Term Tutorial for Lecture Course on Modelling and System Identification (MSI) Albert-Ludwigs-Universität Freiburg Winter Term 2016-2017 Tutorial 3: Emergency Guide to Statistics Prof. Dr. Moritz Diehl, Robin

More information

L2: Review of probability and statistics

L2: Review of probability and statistics Probability L2: Review of probability and statistics Definition of probability Axioms and properties Conditional probability Bayes theorem Random variables Definition of a random variable Cumulative distribution

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

Multivariate probability distributions and linear regression

Multivariate probability distributions and linear regression Multivariate probability distributions and linear regression Patrik Hoyer 1 Contents: Random variable, probability distribution Joint distribution Marginal distribution Conditional distribution Independence,

More information

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr. Topic 2: Probability & Distributions ECO220Y5Y: Quantitative Methods in Economics Dr. Nick Zammit University of Toronto Department of Economics Room KN3272 n.zammit utoronto.ca November 21, 2017 Dr. Nick

More information

Discrete Random Variables

Discrete Random Variables CPSC 53 Systems Modeling and Simulation Discrete Random Variables Dr. Anirban Mahanti Department of Computer Science University of Calgary mahanti@cpsc.ucalgary.ca Random Variables A random variable is

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

Examples of random experiment (a) Random experiment BASIC EXPERIMENT

Examples of random experiment (a) Random experiment BASIC EXPERIMENT Random experiment A random experiment is a process leading to an uncertain outcome, before the experiment is run We usually assume that the experiment can be repeated indefinitely under essentially the

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro CS37300 Class Notes Jennifer Neville, Sebastian Moreno, Bruno Ribeiro 2 Background on Probability and Statistics These are basic definitions, concepts, and equations that should have been covered in your

More information

3. Review of Probability and Statistics

3. Review of Probability and Statistics 3. Review of Probability and Statistics ECE 830, Spring 2014 Probabilistic models will be used throughout the course to represent noise, errors, and uncertainty in signal processing problems. This lecture

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay 1 / 13 Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay August 8, 2013 2 / 13 Random Variable Definition A real-valued

More information

ECE531: Principles of Detection and Estimation Course Introduction

ECE531: Principles of Detection and Estimation Course Introduction ECE531: Principles of Detection and Estimation Course Introduction D. Richard Brown III WPI 22-January-2009 WPI D. Richard Brown III 22-January-2009 1 / 37 Lecture 1 Major Topics 1. Web page. 2. Syllabus

More information

1 Presessional Probability

1 Presessional Probability 1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional

More information

Tom Salisbury

Tom Salisbury MATH 2030 3.00MW Elementary Probability Course Notes Part V: Independence of Random Variables, Law of Large Numbers, Central Limit Theorem, Poisson distribution Geometric & Exponential distributions Tom

More information

More on Distribution Function

More on Distribution Function More on Distribution Function The distribution of a random variable X can be determined directly from its cumulative distribution function F X. Theorem: Let X be any random variable, with cumulative distribution

More information

Chapter 5. Chapter 5 sections

Chapter 5. Chapter 5 sections 1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Review of Probabilities and Basic Statistics

Review of Probabilities and Basic Statistics Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to

More information

Basic Probability. Introduction

Basic Probability. Introduction Basic Probability Introduction The world is an uncertain place. Making predictions about something as seemingly mundane as tomorrow s weather, for example, is actually quite a difficult task. Even with

More information

Discrete Random Variable

Discrete Random Variable Discrete Random Variable Outcome of a random experiment need not to be a number. We are generally interested in some measurement or numerical attribute of the outcome, rather than the outcome itself. n

More information

Probability Dr. Manjula Gunarathna 1

Probability Dr. Manjula Gunarathna 1 Probability Dr. Manjula Gunarathna Probability Dr. Manjula Gunarathna 1 Introduction Probability theory was originated from gambling theory Probability Dr. Manjula Gunarathna 2 History of Probability Galileo

More information

Single Maths B: Introduction to Probability

Single Maths B: Introduction to Probability Single Maths B: Introduction to Probability Overview Lecturer Email Office Homework Webpage Dr Jonathan Cumming j.a.cumming@durham.ac.uk CM233 None! http://maths.dur.ac.uk/stats/people/jac/singleb/ 1 Introduction

More information

Lecture 2: Review of Basic Probability Theory

Lecture 2: Review of Basic Probability Theory ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

Discrete Probability Refresher

Discrete Probability Refresher ECE 1502 Information Theory Discrete Probability Refresher F. R. Kschischang Dept. of Electrical and Computer Engineering University of Toronto January 13, 1999 revised January 11, 2006 Probability theory

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

III - MULTIVARIATE RANDOM VARIABLES

III - MULTIVARIATE RANDOM VARIABLES Computational Methods and advanced Statistics Tools III - MULTIVARIATE RANDOM VARIABLES A random vector, or multivariate random variable, is a vector of n scalar random variables. The random vector is

More information

Preliminary Statistics. Lecture 3: Probability Models and Distributions

Preliminary Statistics. Lecture 3: Probability Models and Distributions Preliminary Statistics Lecture 3: Probability Models and Distributions Rory Macqueen (rm43@soas.ac.uk), September 2015 Outline Revision of Lecture 2 Probability Density Functions Cumulative Distribution

More information

Joint Distribution of Two or More Random Variables

Joint Distribution of Two or More Random Variables Joint Distribution of Two or More Random Variables Sometimes more than one measurement in the form of random variable is taken on each member of the sample space. In cases like this there will be a few

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

. Find E(V ) and var(v ).

. Find E(V ) and var(v ). Math 6382/6383: Probability Models and Mathematical Statistics Sample Preliminary Exam Questions 1. A person tosses a fair coin until she obtains 2 heads in a row. She then tosses a fair die the same number

More information

Review of Basic Probability Theory

Review of Basic Probability Theory Review of Basic Probability Theory James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 35 Review of Basic Probability Theory

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

1 Proof techniques. CS 224W Linear Algebra, Probability, and Proof Techniques

1 Proof techniques. CS 224W Linear Algebra, Probability, and Proof Techniques 1 Proof techniques Here we will learn to prove universal mathematical statements, like the square of any odd number is odd. It s easy enough to show that this is true in specific cases for example, 3 2

More information

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as L30-1 EEL 5544 Noise in Linear Systems Lecture 30 OTHER TRANSFORMS For a continuous, nonnegative RV X, the Laplace transform of X is X (s) = E [ e sx] = 0 f X (x)e sx dx. For a nonnegative RV, the Laplace

More information

Lectures on Elementary Probability. William G. Faris

Lectures on Elementary Probability. William G. Faris Lectures on Elementary Probability William G. Faris February 22, 2002 2 Contents 1 Combinatorics 5 1.1 Factorials and binomial coefficients................. 5 1.2 Sampling with replacement.....................

More information

Lectures on Statistics. William G. Faris

Lectures on Statistics. William G. Faris Lectures on Statistics William G. Faris December 1, 2003 ii Contents 1 Expectation 1 1.1 Random variables and expectation................. 1 1.2 The sample mean........................... 3 1.3 The sample

More information

Probability Theory Review Reading Assignments

Probability Theory Review Reading Assignments Probability Theory Review Reading Assignments R. Duda, P. Hart, and D. Stork, Pattern Classification, John-Wiley, 2nd edition, 2001 (appendix A.4, hard-copy). "Everything I need to know about Probability"

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB

More information

FINAL EXAM: Monday 8-10am

FINAL EXAM: Monday 8-10am ECE 30: Probabilistic Methods in Electrical and Computer Engineering Fall 016 Instructor: Prof. A. R. Reibman FINAL EXAM: Monday 8-10am Fall 016, TTh 3-4:15pm (December 1, 016) This is a closed book exam.

More information

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems

More information

CS 246 Review of Proof Techniques and Probability 01/14/19

CS 246 Review of Proof Techniques and Probability 01/14/19 Note: This document has been adapted from a similar review session for CS224W (Autumn 2018). It was originally compiled by Jessica Su, with minor edits by Jayadev Bhaskaran. 1 Proof techniques Here we

More information

Properties of Probability

Properties of Probability Econ 325 Notes on Probability 1 By Hiro Kasahara Properties of Probability In statistics, we consider random experiments, experiments for which the outcome is random, i.e., cannot be predicted with certainty.

More information

Introduction to Probability Theory

Introduction to Probability Theory Introduction to Probability Theory Ping Yu Department of Economics University of Hong Kong Ping Yu (HKU) Probability 1 / 39 Foundations 1 Foundations 2 Random Variables 3 Expectation 4 Multivariate Random

More information

Econ 325: Introduction to Empirical Economics

Econ 325: Introduction to Empirical Economics Econ 325: Introduction to Empirical Economics Lecture 2 Probability Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 3-1 3.1 Definition Random Experiment a process leading to an uncertain

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming

More information

Stat 5101 Notes: Algorithms

Stat 5101 Notes: Algorithms Stat 5101 Notes: Algorithms Charles J. Geyer January 22, 2016 Contents 1 Calculating an Expectation or a Probability 3 1.1 From a PMF........................... 3 1.2 From a PDF...........................

More information

4th IIA-Penn State Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur

4th IIA-Penn State Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur 4th IIA-Penn State Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur Laws of Probability, Bayes theorem, and the Central Limit Theorem Rahul Roy Indian Statistical Institute, Delhi. Adapted

More information

for valid PSD. PART B (Answer all five units, 5 X 10 = 50 Marks) UNIT I

for valid PSD. PART B (Answer all five units, 5 X 10 = 50 Marks) UNIT I Code: 15A04304 R15 B.Tech II Year I Semester (R15) Regular Examinations November/December 016 PROBABILITY THEY & STOCHASTIC PROCESSES (Electronics and Communication Engineering) Time: 3 hours Max. Marks:

More information

Sample Spaces, Random Variables

Sample Spaces, Random Variables Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

More information

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019 Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial

More information

Preliminary statistics

Preliminary statistics 1 Preliminary statistics The solution of a geophysical inverse problem can be obtained by a combination of information from observed data, the theoretical relation between data and earth parameters (models),

More information

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

STAT2201. Analysis of Engineering & Scientific Data. Unit 3 STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random

More information

8 Laws of large numbers

8 Laws of large numbers 8 Laws of large numbers 8.1 Introduction We first start with the idea of standardizing a random variable. Let X be a random variable with mean µ and variance σ 2. Then Z = (X µ)/σ will be a random variable

More information

ECE531: Principles of Detection and Estimation Course Introduction

ECE531: Principles of Detection and Estimation Course Introduction ECE531: Principles of Detection and Estimation Course Introduction D. Richard Brown III WPI 15-January-2013 WPI D. Richard Brown III 15-January-2013 1 / 39 First Lecture: Major Topics 1. Administrative

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Origins of Probability Theory

Origins of Probability Theory 1 16.584: INTRODUCTION Theory and Tools of Probability required to analyze and design systems subject to uncertain outcomes/unpredictability/randomness. Such systems more generally referred to as Experiments.

More information

Lecture 4: Probability and Discrete Random Variables

Lecture 4: Probability and Discrete Random Variables Error Correcting Codes: Combinatorics, Algorithms and Applications (Fall 2007) Lecture 4: Probability and Discrete Random Variables Wednesday, January 21, 2009 Lecturer: Atri Rudra Scribe: Anonymous 1

More information

Week 12-13: Discrete Probability

Week 12-13: Discrete Probability Week 12-13: Discrete Probability November 21, 2018 1 Probability Space There are many problems about chances or possibilities, called probability in mathematics. When we roll two dice there are possible

More information

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section

More information

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations EECS 70 Discrete Mathematics and Probability Theory Fall 204 Anant Sahai Note 5 Random Variables: Distributions, Independence, and Expectations In the last note, we saw how useful it is to have a way of

More information

Lecture 1: Review on Probability and Statistics

Lecture 1: Review on Probability and Statistics STAT 516: Stochastic Modeling of Scientific Data Autumn 2018 Instructor: Yen-Chi Chen Lecture 1: Review on Probability and Statistics These notes are partially based on those of Mathias Drton. 1.1 Motivating

More information

Lecture Note 1: Probability Theory and Statistics

Lecture Note 1: Probability Theory and Statistics Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition A Brief Mathematical Review Hamid R. Rabiee Jafar Muhammadi, Ali Jalali, Alireza Ghasemi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Probability theory

More information

Probability. Lecture Notes. Adolfo J. Rumbos

Probability. Lecture Notes. Adolfo J. Rumbos Probability Lecture Notes Adolfo J. Rumbos October 20, 204 2 Contents Introduction 5. An example from statistical inference................ 5 2 Probability Spaces 9 2. Sample Spaces and σ fields.....................

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics Lecture 3 October 29, 2012 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline Reminder: Probability density function Cumulative

More information

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015 Probability Refresher Kai Arras, University of Freiburg Winter term 2014/2015 Probability Refresher Introduction to Probability Random variables Joint distribution Marginalization Conditional probability

More information

Review of Probability. CS1538: Introduction to Simulations

Review of Probability. CS1538: Introduction to Simulations Review of Probability CS1538: Introduction to Simulations Probability and Statistics in Simulation Why do we need probability and statistics in simulation? Needed to validate the simulation model Needed

More information

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3 Probability Paul Schrimpf January 23, 2018 Contents 1 Definitions 2 2 Properties 3 3 Random variables 4 3.1 Discrete........................................... 4 3.2 Continuous.........................................

More information