Basic Statistics and Probability Theory

Size: px
Start display at page:

Download "Basic Statistics and Probability Theory"

Transcription

1 0. Basic Statistics and Probability Theory Based on Foundations of Statistical NLP C. Manning & H. Schütze, ch. 2, MIT Press, 2002 Probability theory is nothing but common sense reduced to calculation. Pierre Simon, Marquis de Laplace ( )

2 1. PLAN 1. Elementary Probability Notions: Sample Space, Event Space, and Probability Function Conditional Probability Bayes Theorem Independence of Probabilistic Events 2. Random Variables: Discrete Variables and Continuous Variables Mean, Variance and Standard Deviation Standard Distributions Joint, Marginal and and Conditional Distributions Independence of Random Variables

3 PLAN (cont d) Limit Theorems Laws of Large Numbers Central Limit Theorems 4. Estimating the parameters of probabilistic models from data Maximum Likelihood Estimation (MLE) Maximum A Posteriori (MAP) Estimation 5. Elementary Information Theory Entropy; Conditional Entropy; Joint Entropy Information Gain / Mutual Information Cross-Entropy Relative Entropy / Kullback-Leibler (KL) Divergence Properties: bounds, chain rules, (non-)symmetries, properties pertaining to independence

4 1. Elementary Probability Notions 3. sample space: Ω (either discrete or continuous) event: A Ω the certain event: Ω the impossible event: event space: F = 2 Ω (or a subspace of 2 Ω that contains and is closed under complement and countable union) probability function/distribution: P : F [0, 1] such that: P(Ω) = 1 the countable additivity property: A 1,...,A k disjoint events, P( A i ) = P(A i ) Consequence: for a uniform distribution in a finite sample space: P(A) = #favorable events #all events

5 4. P(A B) = P(A B) P(B) Conditional Probability Note: P(A B) is called the a posteriory probability of A, given B. The multiplication rule: P(A B) = P(A B)P(B) = P(B A)P(A) The chain rule: P(A 1 A 2... A n ) = P(A 1 )P(A 2 A 1 )P(A 3 A 1,A 2 )...P(A n A 1,A 2,...,A n 1 )

6 5. The total probability formula: P(A) = P(A B)P(B)+P(A B)P( B) More generally: if A B i and i j B i B j =, then P(A) = i P(A B i)p(b i ) Bayes Theorem: P(B A) = or P(B A) = or... P(A B) P(B) P(A) P(A B) P(B) P(A B)P(B)+P(A B)P( B)

7 6. Independence of Probabilistic Events Independent events: P(A B) = P(A)P(B) Note: When P(B) 0, the above definition is equivalent to P(A B) = P(A). Conditionally independent events: P(A B C) = P(A C)P(B C), assuming, of course, that P(C) 0. Note: When P(B C) 0, the above definition is equivalent to P(A B,C) = P(A C).

8 7. Let Ω be a sample space, and P : 2 Ω [0,1] a probability function. 2. Random Variables 2.1 Basic Definitions A random variable of distribution P is a function X : Ω R n For now, let us consider n = 1. The cumulative distribution function of X is F : R [0, ) defined by F(x) = P(X x) = P({ω Ω X(ω) x})

9 2.2 Discrete Random Variables 8. Definition: Let P : 2 Ω [0,1] be a probability function, and X be a random variable of distribution P. If Val(X) is either finite or unfinite countable, then X is called a discrete random variable. For such a variable we define the probability mass function (pmf) p : R [0,1] as p(x) not. = p(x = x) def. = P({ω Ω X(ω) = x}). (Obviously, it follows that x i Val(X) p(x i) = 1.) Mean, Variance, and Standard Deviation: Expectation / mean of X: E(X) not. = E[X] = xxp(x) if X is a discrete random variable. Variance of X: Var(X) not. = Var[X] = E((X E(X)) 2 ). Standard deviation: σ = Var(X). Covariance of X and Y, two random variables of distribution P: Cov(X,Y) = E[(X E[X])(Y E[Y])]

10 Exemplification: the Binomial distribution: b(r;n,p) = C r n pr (1 p) n r for r = 1,...,n mean: np, variance: np(1 p) the Bernoulli distribution: b(r; 1, p) mean: p, variance: p(1 p), entropy: plog 2 p (1 p)log 2 (1 p) 9. Binomial probability mass function Binomial cumulative distribution function b(r ; n, p) p = 0.5, n = 20 p = 0.7, n = 20 p = 0.5, n = 40 F(r) p = 0.5, n = 20 p = 0.7, n = 20 p = 0.5, n = r r

11 2.3 Continuous Random Variables Definitions: Let P : 2 Ω [0,1] be a probability function, and X : Ω R be a random variable of distribution P. If Val(X) is unfinite non-countable set, and F, the cumulative distribution function of X is continuous, then X is called a continuous random variable. (It follows, naturally, that P(X = x) = 0, for all x R.) If there exists p : R [0, ) such that F(x) = x p(t)dt, then X is called absolutely continuous. In such a case, p is called the probability density function (pdf) of X. For B R for which p(x)dx exists, B P(X 1 (B)) = p(x)dx, B where X 1 (B) not. = {ω Ω X(ω) B}. In particular, + p(x)dx = 1. Expectation / mean of X: E(X) not. = E[X] = xp(x)dx. 10.

12 11. Exemplification: Normal (Gaussean) distribution: N(x;µ,σ) = 1 2πσ e (x µ)2 2σ 2 mean: µ, variance: σ 2 Standard Normal distribution: N(x; 0, 1) Remark: For n,p such that np(1 p) > 5, the Binomial distributions can be approximated by Normal distributions.

13 12. Gaussian probability density function Gaussian cumulative distribution function N µ,σ 2(X=x) µ = 0, σ = 0.2 µ = 0, σ = 1.0 µ = 0, σ = 5.0 µ = 2, σ = 0.5 φ µ,σ 2(x) µ = 0, σ = 0.2 µ = 0, σ = 1.0 µ = 0, σ = 5.0 µ = 2, σ = x x

14 2.4 Basic Properties of Random Variables 13. Let P : 2 Ω [0,1] be a probability function, X : Ω R n be a random discrete/continuous variable of distribution P. If g : R n R m is a function, then g(x) is a random variable. If g(x) is discrete, then E(g(X)) = x g(x)p(x). If g(x) is continuous, then E(g(X)) = g(x)p(x)dx. If g is non-linear E(g(X)) = g(e(x)). E(aX +b) = ae(x)+b. E(X +Y) = E(X)+E(Y), therefore E[ n i=1 a ix i ] = n i=1 a ie[x i ]. Var(aX) = a 2 Var(X). Var(X +a) = Var(X). Var(X) = E(X 2 ) E 2 (X). Cov(X,Y) = E[XY] E[X]E[Y].

15 2.5 Joint, Marginal and Conditional Distributions Exemplification for the bi-variate case: 14. Let Ω be a sample space, P : 2 Ω [0,1] a probability function, and V : Ω R 2 be a random variable of distribution P. One could naturally see V as a pair of two random variables X : Ω R and Y : Ω R. (More precisely, V(ω) = (x,y) = (X(ω),Y(ω)).) the joint pmf/pdf of X and Y is defined by p(x,y) not. = p X,Y (x,y) = P(X = x,y = y) = P(ω Ω X(ω) = x,y(ω) = y). the marginal pmf/pdf functions of X and Y are: for the discrete case: p X (x) = y p(x,y), p Y(y) = x p(x,y) for the continuous case: p X (x) = y p(x,y)dy, p Y(y) = x p(x,y)dx the conditional pmf/pdf of X given Y is: p X Y (x y) = p X,Y(x,y) p Y (y)

16 2.6 Independence of Random Variables 15. Definitions: Let X,Y be random variables of the same type (i.e. either discrete or continuous), and p X,Y their joint pmf/pdf. X and Y are said to be independent if p X,Y (x,y) = p X (x) p Y (y) for all possible values x and y of X and Y respectively. Similarly, let X,Y and Z be random variables of the same type, and p their joint pmf/pdf. X and Y are conditionally independent given Z if p X,Y Z (x,y z) = p X Z (x z) p Y Z (y z) for all possible values x,y and z of X,Y and Z respectively.

17 16. Properties of random variables pertaining to independence If X,Y are independent, then Var(X +Y) = Var(X) + Var(Y). If X,Y are independent, then E(XY) = E(X)E(Y), i.e. Cov(X,Y) = 0. Cov(X,Y) = 0 X,Y are independent. The covariance matrix corresponding to a vector of random variables is symmetric and positive semi-definite. If the covariance matrix of a multi-variate Gaussian distribution is diagonal, then the marginal distributions are independent.

18 Limit Theorems [ Sheldon Ross, A first course in probability, 5th ed., 1998 ] The most important results in probability theory are limit theorems. Of these, the most important are... laws of large numbers, concerned with stating conditions under which the average of a sequence of random variables converges (in some sense) to the expected average; central limit theorems, concerned with determining the conditions under which the sum of a large number of random variables has a probability distribution that is approximately normal.

19 Two basic inequalities and the weak law of large numbers 18. Markov s inequality: If X is a random variable that takes only non-negative values, then for any value a > 0, P(X a) E[X] a Chebyshev s inequality: If X is a random variable with finite mean µ and variance σ 2, then for any value k > 0, P( X µ k) σ2 k 2 The weak law of large numbers (Bernoulli; Khintchine): Let X 1,X 2,...,X n be a sequence of independent and identically distributed random variables, each having a finite mean E[X i ] = µ. Then, for any value ǫ > 0, ( X X n P n ) µ ǫ 0 as n

20 19. The central limit theorem for i.i.d. random variables [ Pierre Simon, Marquis de Laplace; Liapunoff in ] Let X 1,X 2,...,X n be a sequence of independent random variables, each having mean µ and variance σ 2. Then the distribution of X X n nµ σ n tends to be the standard normal (Gaussian) as n. That is, for < a <, ( X X n nµ P σ n ) a 1 2π a e x2 /2 dx as n

21 The central limit theorem for independent random variables 20. Let X 1,X 2,...,X n be a sequence of independent and identically distributed random variables having respective means µ i and variances σ 2 i. If (a) the variables X i are uniformly bounded, i.e. for some M R + P( X i < M) = 1 for all i, and (b) i=1 σ2 i =, then P ( n i=1 (X i µ i ) ) n i=1 σ2 i a Φ(a) as n where Φ is the cumulative distribution function for the standard normal (Gaussian) distribution.

22 21. The strong law of large numbers Let X 1,X 2,...,X n be a sequence of independent and identically distributed random variables, each having a finite mean E[X i ] = µ. Then, with probability 1, X X n n µ as n That is, P ( ) lim (X X n )/n = µ n = 1

23 Other inequalities One-sided Chebyshev inequality: If X is a random variable with mean 0 and finite variance σ 2, then for any a > 0, P(X a) σ2 σ 2 +a Corollary: If E[X] = µ, Var(X) = σ 2, then for a > 0 P(X µ+a) σ2 σ 2 +a 2 Chernoff bounds: Let M(t) not = E[e tx ]. Then P(X µ a) σ2 σ 2 +a 2 P(X a) e ta M(t) for all t > 0 P(X a) e ta M(t) for all t < 0

24 Estimation/inference of the parameters of probabilistic models from data (based on [Durbin et al, Biological Sequence Analysis, 1998], p , ) A probabilistic model can be anything from a simple distribution to a complex stochastic grammar with many implicit probability distributions. Once the type of the model is chosen, the parameters have to be inferred from data. We will first consider the case of the categorical distribution, and then we will present the different strategies that can be used in general.

25 A case study: Estimation of the parameters of a categorical distribution from data 24. Assume that the observations for example, when rolling a die about which we don t know whether it is fair or not, or when counting the number of times the amino acid i occurs in a column of a multiple sequence alignment can be expressed as counts n i for each outcome i (i = 1,l...,K), and we want to estimate the probabilities θ i of the underlying distribution. Case 1: When we have plenty of data, it is natural to use the maximum likelihood (ML) solution, i.e. the observed frequency θi ML = n i not. = n i j n j N. Note: it is easy to show that indeed P(n θ ML ) > P(n θ) for any θ θ ML. log P(n θml ) P(n θ) = log Π i(θi ML ) n i Π i θ n i i = i n i log θml i = N θ i i θ ML i log θml i > 0 θ i The inequality follows from the fact that the relative entropy is always positive except when the two distributions are identical.

26 Case 2: When the data is scarce, it is not clear what is the best estimate. In general, we should use prior knowledge, via Bayesian statistics. For instance, one can use the Dirichlet distribution with parameters α. 25. P(θ n) = P(n θ)d(θ α) P(n) It can be shown (see calculus on R. Durbin et. al. BSA book, pag. 320) that the posterior mean estimation (PME) of the parameters is θ PME i def. = θp(θ n)dθ = n i +α i N + j α j The α s are like pseudocounts added to the real counts. (If we think of the α s as extra observations added to the real ones, this is precisely the ML estimate!) This makes the Dirichlet regulariser very intuitive. How to use the pseudocounts: If it is fairly obvious that a certain residue, let s say i, is very common, than we should give it a very high pseudocount α i ; if the residue j is generaly rare, we should give it a low pseudocount.

27 Strategies to be used in the general case A. The Maximum Likelihood (ML) Estimate When we wish to infer the parameters θ = (θ i ) for a model M from a set of data D, the most obvious strategy is to maximise P(D θ,m) over all possible values of θ. Formally: 26. θ ML = argmax θ P(D θ,m) Note: Generally speaking, when we treat P(x y) as a function of x (and y is fixed), we refer to it as a probability. When we treat P(x y) as a function of y (and x is fixed), we call it a likelihood. Note that a likelihood is not a probability distribution or density; it is simply a function of the variable y. A serious drawback of maximum likelihood is that it gives poor results when data is scarce. The solution then is to introduce more prior knowledge, using Bayes theorem. (In the Bayesian framework, the parameters are themselves seen as random variables!)

28 B. The Maximum A posteriori Probability (MAP) Estimate 27. MAP def. θ = argmax θ = argmax θ P(θ D,M) = argmax θ P(D θ,m)p(θ M) P(D θ,m)p(θ M) P(D M) The prior probability P(θ M) has to be chosen in some reasonable manner, and this is the art of Bayesian estimation (although this freedom to choose a prior has made Bayesian statistics controversial at times...). C. The Posterior Mean Estimator (PME) θ PME = θp(θ D,M)dθ where the integral is over all probability vectors, i.e. all those that sum to one. D. Yet another solution is to use the posterior probability P(θ D,M) to sample from it (see [Durbin et al, 1998], section 11.4) and thereby locate regions of high probability for the model parameters.

29 5. Elementary Information Theory Definitions: Let X and Y be discrete random variables. Entropy: H(X) def. = 1 xp(x) log p(x) = x p(x)logp(x) = E p[ logp(x)]. Convention: if p(x) = 0 then we shall consider p(x)logp(x) = 0. Specific Conditional entropy: H(Y X = x) def. = y Y p(y x)logp(y x). Average conditional entropy: H(Y X) def. = imed. x X p(x)h(y X = x) = x X y Y p(x,y)logp(y x). Joint entropy: H(X,Y) def. = x,yp(x,y) logp(x,y)dem. = H(X)+H(Y X) dem. = H(Y)+H(X Y). Information gain (or: Mutual information): 28. IG(X;Y) def. = H(X) H(X Y) imed. = H(Y) H(Y X) imed. = H(X,Y) H(X Y) H(Y X) = IG(Y;X).

30 Exemplification: Entropy of a Bernoulli Distribution 29. H(p) = plog 2 p (1 p)log 2 (1 p) p

31 Basic properties of Entropy, Conditional Entropy, Joint Entropy and Information Gain / Mutual Information 30. ( 1 0 H(p 1,...,p n ) H n,..., 1 ) = logn; n H(X) = 0 iff X is a constant random variable. IG(X;Y) 0; IG(X;Y) = 0 iff X and Y are independent. H(X Y) H(X) H(X Y) = H(X) iff X and Y are independent; H(X,Y) H(X)+H(Y); H(X,Y) = H(X)+H(Y) iff X and Y are independent. a chain rule: H(X 1,...,X n ) = H(X 1 )+H(X 2 X 1 )+...+H(X n X 1,...,X n 1 ).

32 The Relationship between Entropy, Conditional Entropy, Joint Entropy and 31. Mutual Information H(X,Y) H(X Y) IG(X,Y) H(Y X) H(X) H(Y)

33 Other definitions 32. Let X be a discrete random variable, p its pmf and q another pmf (usually a model of p). Cross-entropy: CH(X,q) = x X p(x)logq(x) = E p [log ] 1 q(x) Let X and Y be discrete random variables, and p and q their respective pmf s. Relative entropy (or, Kullback-Leibler divergence): KL(p q) = x X p(x)log q(x) p(x) = E p = CH(X,q) H(X). [ log p(x) q(x) ]

34 33. The Relationship between Entropy, Conditional Entropy, Joint Entropy, Information Gain, Cross-Entropy and Relative Entropy (or KL divergence) CH(X, p ) H(X,Y) CH(Y, p ) Y X KL(p p ) X Y H(X Y) H(Y X) KL(p p ) Y X H(p ) X IG(X,Y) = KL(p p p ) XY X Y H(p ) Y

35 CH(X,q) 0 Basic properties of cross-entropy and relative entropy KL(p q) 0 for all p and q; KL(p q) = 0 iff p and q are identical. [Consequence:] If X is a discrete random variable, p its pmf, and q another pmf, then CH(X,q) H(X) 0. The first of these two inequations is also known as Gibbs inequation: n i=1 p ilog 2 p i i p ilog 2 q i. Unlike H of a discrete n-ary variable, which is bounded by log 2 n, there is no (general) upper bound for CH. (However, KL is upper-bounded.) Unlike H(X,Y), which is symmetric in its arguments, CH and KL are not! Therefore KL is NOT a distance metric! (See the next slide.) IG(X;Y) = KL(p XY p X p Y ) = ( ) p(x)p(y) x y p(x,y)log. p(x, y) 34.

36 Remark 35. The quantity VI(X,Y) def = H(X,Y) IG(X;Y) = H(X)+H(Y) 2IG(X;Y) = H(X Y)+H(Y X) known as variation of information, is a distance metric, i.e. it is nonengative, symmetric, implies indiscernability, and satisfies the triangle inequality. Consider M(p,q) = 1 2 (p+q). The function JSD(p q) = 1 2 KL(p M)+ 1 KL(q M) is called the Jensen- 2 Shannon divergence. One can prove that JSD(p q) defines a distance metric (the Jensen- Shannon distance).

37 Recommended Exercises From [Manning & Schütze, 2002, ch. 2:] Examples 1, 2, 4, 5, 7, 8, 9 Exercises 2.1, 2.3, 2.4, 2.5 From [Sheldon Ross, 1998, ch. 8:] Examples 2a, 2b, 3a, 3b, 3c, 5a, 5b

38 Addenda: Other Examples of Probabilistic Distributions 37.

39 Multinomial distribution: generalises the binomial distribution to the case where there are K independent outcomes with probabilities θ i, i = 1,...,K such that K i=1 θ i = 1. The probability of getting n i occurrence of outcome i is given by P(n θ) = n! Π K i=1 (n i!) ΠK i=1θ n i i, 38. where n = n n K, and θ = (θ 1,...,θ K ). Note: The particular case n = 1 represents the categorical distribution. This is a generalisation of the Bernoulli distribution. Example: The outcome of rolling a die n times is described by a categorical distribution. The probabilities of each of the 6 outcomes are θ 1,...,θ 6. For a fair die, θ 1 =... = θ 6, and the probability of rolling it 12 times and getting each outcome twice is: 12! (2!) 6 ( ) 12 1 =

40 Poisson distribution (or, Poisson law of small numbers): 39. p(k;λ) = λk k! e λ, with k N and parameter λ > 0. Mean = variance = λ. Poisson probability mass function Poisson cumulative distribution function P(X=k) λ = 1 λ = 4 λ = 10 P(X k) λ = 1 λ = 4 λ = k k

41 40. Exponential distribution (a.k.a. negative exponential distribution): p(x;λ) = λe λx for x 0 and parameter λ > 0. Mean = λ 1, variance = λ 2. Exponential probability density function Exponential cumulative distribution function p(x) λ = 0.5 λ = 1 λ = 1.5 P(X x) λ = 0.5 λ = 1 λ = x x

42 Gamma distribution: p(x;k,θ) = x k 1 e x/θ Mean = kθ, variance = kθ 2. for x 0 and parameters k > 0 (shape) and θ > 0 (scale). Γ(k)θk The gamma function is a generalisation of the factorial function to real values. For any positive real number x, Γ(x+1) = xγ(x). (Thus, for integers Γ(n) = (n 1)!.) 41. Gamma probability density function Gamma cumulative distribution function p(x) k = 1.0, θ = 2.0 k = 2.0, θ = 2.0 k = 3.0, θ = 2.0 k = 5.0, θ = 1.0 k = 9.0, θ = 0.5 k = 7.5, θ = 1.0 k = 0.5, θ = 1.0 P(X x) k = 1.0, θ = 2.0 k = 2.0, θ = 2.0 k = 3.0, θ = 2.0 k = 5.0, θ = 1.0 k = 9.0, θ = 0.5 k = 7.5, θ = 1.0 k = 0.5, θ = x x

43 χ 2 distribution: p(x;ν) = 1 Γ(ν/2) ( 1 2 ) ν/2 x ν/2 1 e 1 2 x for x 0 and ν a positive integer. It is obtained from Gamma distribution by taking k = ν/2 and θ = 2. Mean = ν, variance = 2ν. 42. Chi Squared probability density function Chi Squared cumulative distribution function p(x) k = 1 k = 2 k = 3 k = 4 k = 6 k = 9 P(X x) k = 1 k = 2 k = 3 k = 4 k = 6 k = x x

44 Laplace distribution: p(x;µ,θ) = 1 2θ e µ x θ. Mean = µ, variance = 2θ Laplace probability density function Laplace cumulative density function p(x) µ = 0, θ = 1 µ = 0, θ = 2 µ = 0, θ = 4 µ = 5, θ = 4 P(X x) µ = 0, θ = 1 µ = 0, θ = 2 µ = 0, θ = 4 µ = 5, θ = x x

45 Student s distribution: ( ) ν +1 Γ 2 p(x;ν) = ( ν (1+ νπ Γ 2) x2 ν ) ν +1 2 for x R and ν > 0 (the degree of freedom param.) 44. Mean = 0 for ν > 1, otherwise undefined. Variance = ν for ν > 2, for 1 < ν 2, otherwise undefined. ν 2 The probability density function and the cumulative distribution function: Note [from Wiki]: The t-distribution is symmetric and bell-shaped, like the normal distribution, but it has havier tails, meaning that it is more prone to producing values that fall far from its mean.

46 45. Dirichlet distribution: D(θ α) = 1 Z(α) ΠK i=1 θα i 1 i δ( K i=1 θ i 1) where α = α 1,...,α K with α i > 0 are the parameters, θ i satisfy 0 θ i 1 and sum to 1, this being indicated by the delta function term δ( i θ i 1), and the normalising factor can be expressed in terms of the gamma function: Z(α) = Π K i=1 θα i 1 i δ( i 1)dθ = Π iγ(α i ) Γ( i α i) α i Mean of θ i : j α. j For K = 2, the Dirichlet distribution reduces to the more widely known beta distribution, and the normalising constant is the beta function.

47 46. Remark: Concerning the multinomial and Dirichlet distributions: The algebraic expression for the parameters θ i is similar in the two distributions. However, the multinomial is a distribution over its exponents n i, whereas the Dirichlet is a distribution over the numbers θ i that are exponentiated. The two distributions are said to be conjugate distributions and their close formal relationship leads to a harmonious interplay in many estimation problems. Similarly, the beta distribution is the conjugate of the Bernoulli distribution, and the gamma distribution is the conjugate of the Poisson distribution.

Basic Statistics and Probability Theory

Basic Statistics and Probability Theory 0. Basic Statistics and Probability Theory Based on Foundations of Statistical NLP C. Manning & H. Schütze, ch. 2, MIT Press, 2002 Probability theory is nothing but common sense reduced to calculation.

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

1 Presessional Probability

1 Presessional Probability 1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example

More information

Appendix A : Introduction to Probability and stochastic processes

Appendix A : Introduction to Probability and stochastic processes A-1 Mathematical methods in communication July 5th, 2009 Appendix A : Introduction to Probability and stochastic processes Lecturer: Haim Permuter Scribe: Shai Shapira and Uri Livnat The probability of

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014 Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions

More information

Discrete Probability Refresher

Discrete Probability Refresher ECE 1502 Information Theory Discrete Probability Refresher F. R. Kschischang Dept. of Electrical and Computer Engineering University of Toronto January 13, 1999 revised January 11, 2006 Probability theory

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.

More information

CME 106: Review Probability theory

CME 106: Review Probability theory : Probability theory Sven Schmit April 3, 2015 1 Overview In the first half of the course, we covered topics from probability theory. The difference between statistics and probability theory is the following:

More information

Probability Review. Chao Lan

Probability Review. Chao Lan Probability Review Chao Lan Let s start with a single random variable Random Experiment A random experiment has three elements 1. sample space Ω: set of all possible outcomes e.g.,ω={1,2,3,4,5,6} 2. event

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

1 Probability theory. 2 Random variables and probability theory.

1 Probability theory. 2 Random variables and probability theory. Probability theory Here we summarize some of the probability theory we need. If this is totally unfamiliar to you, you should look at one of the sources given in the readings. In essence, for the major

More information

Review of Probabilities and Basic Statistics

Review of Probabilities and Basic Statistics Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to

More information

EE514A Information Theory I Fall 2013

EE514A Information Theory I Fall 2013 EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/

More information

1 Review of Probability and Distributions

1 Review of Probability and Distributions Random variables. A numerically valued function X of an outcome ω from a sample space Ω X : Ω R : ω X(ω) is called a random variable (r.v.), and usually determined by an experiment. We conventionally denote

More information

Fundamental Tools - Probability Theory II

Fundamental Tools - Probability Theory II Fundamental Tools - Probability Theory II MSc Financial Mathematics The University of Warwick September 29, 2015 MSc Financial Mathematics Fundamental Tools - Probability Theory II 1 / 22 Measurable random

More information

Week 2. Review of Probability, Random Variables and Univariate Distributions

Week 2. Review of Probability, Random Variables and Univariate Distributions Week 2 Review of Probability, Random Variables and Univariate Distributions Probability Probability Probability Motivation What use is Probability Theory? Probability models Basis for statistical inference

More information

7 Random samples and sampling distributions

7 Random samples and sampling distributions 7 Random samples and sampling distributions 7.1 Introduction - random samples We will use the term experiment in a very general way to refer to some process, procedure or natural phenomena that produces

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Lecture 1: Review on Probability and Statistics

Lecture 1: Review on Probability and Statistics STAT 516: Stochastic Modeling of Scientific Data Autumn 2018 Instructor: Yen-Chi Chen Lecture 1: Review on Probability and Statistics These notes are partially based on those of Mathias Drton. 1.1 Motivating

More information

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27 Probability Review Yutian Li Stanford University January 18, 2018 Yutian Li (Stanford University) Probability Review January 18, 2018 1 / 27 Outline 1 Elements of probability 2 Random variables 3 Multiple

More information

Continuous Random Variables and Continuous Distributions

Continuous Random Variables and Continuous Distributions Continuous Random Variables and Continuous Distributions Continuous Random Variables and Continuous Distributions Expectation & Variance of Continuous Random Variables ( 5.2) The Uniform Random Variable

More information

Chapter 1. Sets and probability. 1.3 Probability space

Chapter 1. Sets and probability. 1.3 Probability space Random processes - Chapter 1. Sets and probability 1 Random processes Chapter 1. Sets and probability 1.3 Probability space 1.3 Probability space Random processes - Chapter 1. Sets and probability 2 Probability

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University Chapter 3, 4 Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University ENCS6161 p.1/47 The Notion of a Random Variable A random variable X is a function that assigns a real

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

3 Multiple Discrete Random Variables

3 Multiple Discrete Random Variables 3 Multiple Discrete Random Variables 3.1 Joint densities Suppose we have a probability space (Ω, F,P) and now we have two discrete random variables X and Y on it. They have probability mass functions f

More information

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14 Math 325 Intro. Probability & Statistics Summer Homework 5: Due 7/3/. Let X and Y be continuous random variables with joint/marginal p.d.f. s f(x, y) 2, x y, f (x) 2( x), x, f 2 (y) 2y, y. Find the conditional

More information

1 Random Variable: Topics

1 Random Variable: Topics Note: Handouts DO NOT replace the book. In most cases, they only provide a guideline on topics and an intuitive feel. 1 Random Variable: Topics Chap 2, 2.1-2.4 and Chap 3, 3.1-3.3 What is a random variable?

More information

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework

More information

Examples of random experiment (a) Random experiment BASIC EXPERIMENT

Examples of random experiment (a) Random experiment BASIC EXPERIMENT Random experiment A random experiment is a process leading to an uncertain outcome, before the experiment is run We usually assume that the experiment can be repeated indefinitely under essentially the

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

3. Probability and Statistics

3. Probability and Statistics FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important

More information

Exercises with solutions (Set D)

Exercises with solutions (Set D) Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where

More information

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18 Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable

More information

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable Distributions of Functions of Random Variables 5.1 Functions of One Random Variable 5.2 Transformations of Two Random Variables 5.3 Several Random Variables 5.4 The Moment-Generating Function Technique

More information

Statistical Machine Learning Lectures 4: Variational Bayes

Statistical Machine Learning Lectures 4: Variational Bayes 1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Probabilities Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: AI systems need to reason about what they know, or not know. Uncertainty may have so many sources:

More information

Sample Spaces, Random Variables

Sample Spaces, Random Variables Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational

More information

Probability Background

Probability Background Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second

More information

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

Continuous Random Variables

Continuous Random Variables 1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Lecture 1: Introduction to Probabilistic Modelling Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Why a

More information

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr. Topic 2: Probability & Distributions ECO220Y5Y: Quantitative Methods in Economics Dr. Nick Zammit University of Toronto Department of Economics Room KN3272 n.zammit utoronto.ca November 21, 2017 Dr. Nick

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline. Random Variables Amappingthattransformstheeventstotherealline. Example 1. Toss a fair coin. Define a random variable X where X is 1 if head appears and X is if tail appears. P (X =)=1/2 P (X =1)=1/2 Example

More information

Week 12-13: Discrete Probability

Week 12-13: Discrete Probability Week 12-13: Discrete Probability November 21, 2018 1 Probability Space There are many problems about chances or possibilities, called probability in mathematics. When we roll two dice there are possible

More information

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Classification & Information Theory Lecture #8

Classification & Information Theory Lecture #8 Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing

More information

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014 Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,

More information

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

STAT2201. Analysis of Engineering & Scientific Data. Unit 3 STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

Stat410 Probability and Statistics II (F16)

Stat410 Probability and Statistics II (F16) Stat4 Probability and Statistics II (F6 Exponential, Poisson and Gamma Suppose on average every /λ hours, a Stochastic train arrives at the Random station. Further we assume the waiting time between two

More information

Intro to Probability. Andrei Barbu

Intro to Probability. Andrei Barbu Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems

More information

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58 Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian

More information

CSE 312 Final Review: Section AA

CSE 312 Final Review: Section AA CSE 312 TAs December 8, 2011 General Information General Information Comprehensive Midterm General Information Comprehensive Midterm Heavily weighted toward material after the midterm Pre-Midterm Material

More information

Bayesian statistics, simulation and software

Bayesian statistics, simulation and software Module 1: Course intro and probability brush-up Department of Mathematical Sciences Aalborg University 1/22 Bayesian Statistics, Simulations and Software Course outline Course consists of 12 half-days

More information

Lecture 4: Probability and Discrete Random Variables

Lecture 4: Probability and Discrete Random Variables Error Correcting Codes: Combinatorics, Algorithms and Applications (Fall 2007) Lecture 4: Probability and Discrete Random Variables Wednesday, January 21, 2009 Lecturer: Atri Rudra Scribe: Anonymous 1

More information

Chapter 3: Random Variables 1

Chapter 3: Random Variables 1 Chapter 3: Random Variables 1 Yunghsiang S. Han Graduate Institute of Communication Engineering, National Taipei University Taiwan E-mail: yshan@mail.ntpu.edu.tw 1 Modified from the lecture notes by Prof.

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 34 To start out the course, we need to know something about statistics and This is only an introduction; for a fuller understanding, you would

More information

Lecture 3. Probability - Part 2. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. October 19, 2016

Lecture 3. Probability - Part 2. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. October 19, 2016 Lecture 3 Probability - Part 2 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 19, 2016 Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 1 / 46 Outline 1 Common Continuous

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

Northwestern University Department of Electrical Engineering and Computer Science

Northwestern University Department of Electrical Engineering and Computer Science Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability

More information

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems

More information

Probability. Computer Science Tripos, Part IA. R.J. Gibbens. Computer Laboratory University of Cambridge. Easter Term 2008/9

Probability. Computer Science Tripos, Part IA. R.J. Gibbens. Computer Laboratory University of Cambridge. Easter Term 2008/9 Probability Computer Science Tripos, Part IA R.J. Gibbens Computer Laboratory University of Cambridge Easter Term 2008/9 Last revision: 2009-05-06/r-36 1 Outline Elementary probability theory (2 lectures)

More information

Bivariate distributions

Bivariate distributions Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient

More information

PMR Introduction. Probabilistic Modelling and Reasoning. Amos Storkey. School of Informatics, University of Edinburgh

PMR Introduction. Probabilistic Modelling and Reasoning. Amos Storkey. School of Informatics, University of Edinburgh PMR Introduction Probabilistic Modelling and Reasoning Amos Storkey School of Informatics, University of Edinburgh Amos Storkey PMR Introduction School of Informatics, University of Edinburgh 1/50 Outline

More information

EE4601 Communication Systems

EE4601 Communication Systems EE4601 Communication Systems Week 2 Review of Probability, Important Distributions 0 c 2011, Georgia Institute of Technology (lect2 1) Conditional Probability Consider a sample space that consists of two

More information

BASICS OF PROBABILITY

BASICS OF PROBABILITY October 10, 2018 BASICS OF PROBABILITY Randomness, sample space and probability Probability is concerned with random experiments. That is, an experiment, the outcome of which cannot be predicted with certainty,

More information

Topic 3: The Expectation of a Random Variable

Topic 3: The Expectation of a Random Variable Topic 3: The Expectation of a Random Variable Course 003, 2017 Page 0 Expectation of a discrete random variable Definition (Expectation of a discrete r.v.): The expected value (also called the expectation

More information

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1). Name M362K Final Exam Instructions: Show all of your work. You do not have to simplify your answers. No calculators allowed. There is a table of formulae on the last page. 1. Suppose X 1,..., X 1 are independent

More information

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality

More information

Advanced Probabilistic Modeling in R Day 1

Advanced Probabilistic Modeling in R Day 1 Advanced Probabilistic Modeling in R Day 1 Roger Levy University of California, San Diego July 20, 2015 1/24 Today s content Quick review of probability: axioms, joint & conditional probabilities, Bayes

More information

Lectures for APM 541: Stochastic Modeling in Biology. Jay Taylor

Lectures for APM 541: Stochastic Modeling in Biology. Jay Taylor Lectures for APM 541: Stochastic Modeling in Biology Jay Taylor November 3, 2011 Contents 1 Distributions, Expectations, and Random Variables 4 1.1 Probability Spaces...................................

More information

Lecture 11. Probability Theory: an Overveiw

Lecture 11. Probability Theory: an Overveiw Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the

More information

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces. Probability Theory To start out the course, we need to know something about statistics and probability Introduction to Probability Theory L645 Advanced NLP Autumn 2009 This is only an introduction; for

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information