Basic Statistics and Probability Theory
|
|
- Lillian Martin
- 6 years ago
- Views:
Transcription
1 0. Basic Statistics and Probability Theory Based on Foundations of Statistical NLP C. Manning & H. Schütze, ch. 2, MIT Press, 2002 Probability theory is nothing but common sense reduced to calculation. Pierre Simon, Marquis de Laplace ( )
2 1. PLAN 1. Elementary Probability Notions: Sample Space, Event Space, and Probability Function Conditional Probability Bayes Theorem Independence of Probabilistic Events 2. Random Variables: Discrete Variables and Continuous Variables Mean, Variance and Standard Deviation Standard Distributions Joint, Marginal and and Conditional Distributions Independence of Random Variables
3 PLAN (cont d) Limit Theorems Laws of Large Numbers Central Limit Theorems 4. Estimating the parameters of probabilistic models from data Maximum Likelihood Estimation (MLE) Maximum A Posteriori (MAP) Estimation 5. Elementary Information Theory Entropy; Conditional Entropy; Joint Entropy Information Gain / Mutual Information Cross-Entropy Relative Entropy / Kullback-Leibler (KL) Divergence Properties: bounds, chain rules, (non-)symmetries, properties pertaining to independence
4 1. Elementary Probability Notions 3. sample space: Ω (either discrete or continuous) event: A Ω the certain event: Ω the impossible event: event space: F = 2 Ω (or a subspace of 2 Ω that contains and is closed under complement and countable union) probability function/distribution: P : F [0, 1] such that: P(Ω) = 1 the countable additivity property: A 1,...,A k disjoint events, P( A i ) = P(A i ) Consequence: for a uniform distribution in a finite sample space: P(A) = #favorable events #all events
5 4. P(A B) = P(A B) P(B) Conditional Probability Note: P(A B) is called the a posteriory probability of A, given B. The multiplication rule: P(A B) = P(A B)P(B) = P(B A)P(A) The chain rule: P(A 1 A 2... A n ) = P(A 1 )P(A 2 A 1 )P(A 3 A 1,A 2 )...P(A n A 1,A 2,...,A n 1 )
6 5. The total probability formula: P(A) = P(A B)P(B)+P(A B)P( B) More generally: if A B i and i j B i B j =, then P(A) = i P(A B i)p(b i ) Bayes Theorem: P(B A) = or P(B A) = or... P(A B) P(B) P(A) P(A B) P(B) P(A B)P(B)+P(A B)P( B)
7 6. Independence of Probabilistic Events Independent events: P(A B) = P(A)P(B) Note: When P(B) 0, the above definition is equivalent to P(A B) = P(A). Conditionally independent events: P(A B C) = P(A C)P(B C), assuming, of course, that P(C) 0. Note: When P(B C) 0, the above definition is equivalent to P(A B,C) = P(A C).
8 7. Let Ω be a sample space, and P : 2 Ω [0,1] a probability function. 2. Random Variables 2.1 Basic Definitions A random variable of distribution P is a function X : Ω R n For now, let us consider n = 1. The cumulative distribution function of X is F : R [0, ) defined by F(x) = P(X x) = P({ω Ω X(ω) x})
9 2.2 Discrete Random Variables 8. Definition: Let P : 2 Ω [0,1] be a probability function, and X be a random variable of distribution P. If Val(X) is either finite or unfinite countable, then X is called a discrete random variable. For such a variable we define the probability mass function (pmf) p : R [0,1] as p(x) not. = p(x = x) def. = P({ω Ω X(ω) = x}). (Obviously, it follows that x i Val(X) p(x i) = 1.) Mean, Variance, and Standard Deviation: Expectation / mean of X: E(X) not. = E[X] = xxp(x) if X is a discrete random variable. Variance of X: Var(X) not. = Var[X] = E((X E(X)) 2 ). Standard deviation: σ = Var(X). Covariance of X and Y, two random variables of distribution P: Cov(X,Y) = E[(X E[X])(Y E[Y])]
10 Exemplification: the Binomial distribution: b(r;n,p) = C r n pr (1 p) n r for r = 1,...,n mean: np, variance: np(1 p) the Bernoulli distribution: b(r; 1, p) mean: p, variance: p(1 p), entropy: plog 2 p (1 p)log 2 (1 p) 9. Binomial probability mass function Binomial cumulative distribution function b(r ; n, p) p = 0.5, n = 20 p = 0.7, n = 20 p = 0.5, n = 40 F(r) p = 0.5, n = 20 p = 0.7, n = 20 p = 0.5, n = r r
11 2.3 Continuous Random Variables Definitions: Let P : 2 Ω [0,1] be a probability function, and X : Ω R be a random variable of distribution P. If Val(X) is unfinite non-countable set, and F, the cumulative distribution function of X is continuous, then X is called a continuous random variable. (It follows, naturally, that P(X = x) = 0, for all x R.) If there exists p : R [0, ) such that F(x) = x p(t)dt, then X is called absolutely continuous. In such a case, p is called the probability density function (pdf) of X. For B R for which p(x)dx exists, B P(X 1 (B)) = p(x)dx, B where X 1 (B) not. = {ω Ω X(ω) B}. In particular, + p(x)dx = 1. Expectation / mean of X: E(X) not. = E[X] = xp(x)dx. 10.
12 11. Exemplification: Normal (Gaussean) distribution: N(x;µ,σ) = 1 2πσ e (x µ)2 2σ 2 mean: µ, variance: σ 2 Standard Normal distribution: N(x; 0, 1) Remark: For n,p such that np(1 p) > 5, the Binomial distributions can be approximated by Normal distributions.
13 12. Gaussian probability density function Gaussian cumulative distribution function N µ,σ 2(X=x) µ = 0, σ = 0.2 µ = 0, σ = 1.0 µ = 0, σ = 5.0 µ = 2, σ = 0.5 φ µ,σ 2(x) µ = 0, σ = 0.2 µ = 0, σ = 1.0 µ = 0, σ = 5.0 µ = 2, σ = x x
14 2.4 Basic Properties of Random Variables 13. Let P : 2 Ω [0,1] be a probability function, X : Ω R n be a random discrete/continuous variable of distribution P. If g : R n R m is a function, then g(x) is a random variable. If g(x) is discrete, then E(g(X)) = x g(x)p(x). If g(x) is continuous, then E(g(X)) = g(x)p(x)dx. If g is non-linear E(g(X)) = g(e(x)). E(aX +b) = ae(x)+b. E(X +Y) = E(X)+E(Y), therefore E[ n i=1 a ix i ] = n i=1 a ie[x i ]. Var(aX) = a 2 Var(X). Var(X +a) = Var(X). Var(X) = E(X 2 ) E 2 (X). Cov(X,Y) = E[XY] E[X]E[Y].
15 2.5 Joint, Marginal and Conditional Distributions Exemplification for the bi-variate case: 14. Let Ω be a sample space, P : 2 Ω [0,1] a probability function, and V : Ω R 2 be a random variable of distribution P. One could naturally see V as a pair of two random variables X : Ω R and Y : Ω R. (More precisely, V(ω) = (x,y) = (X(ω),Y(ω)).) the joint pmf/pdf of X and Y is defined by p(x,y) not. = p X,Y (x,y) = P(X = x,y = y) = P(ω Ω X(ω) = x,y(ω) = y). the marginal pmf/pdf functions of X and Y are: for the discrete case: p X (x) = y p(x,y), p Y(y) = x p(x,y) for the continuous case: p X (x) = y p(x,y)dy, p Y(y) = x p(x,y)dx the conditional pmf/pdf of X given Y is: p X Y (x y) = p X,Y(x,y) p Y (y)
16 2.6 Independence of Random Variables 15. Definitions: Let X,Y be random variables of the same type (i.e. either discrete or continuous), and p X,Y their joint pmf/pdf. X and Y are said to be independent if p X,Y (x,y) = p X (x) p Y (y) for all possible values x and y of X and Y respectively. Similarly, let X,Y and Z be random variables of the same type, and p their joint pmf/pdf. X and Y are conditionally independent given Z if p X,Y Z (x,y z) = p X Z (x z) p Y Z (y z) for all possible values x,y and z of X,Y and Z respectively.
17 16. Properties of random variables pertaining to independence If X,Y are independent, then Var(X +Y) = Var(X) + Var(Y). If X,Y are independent, then E(XY) = E(X)E(Y), i.e. Cov(X,Y) = 0. Cov(X,Y) = 0 X,Y are independent. The covariance matrix corresponding to a vector of random variables is symmetric and positive semi-definite. If the covariance matrix of a multi-variate Gaussian distribution is diagonal, then the marginal distributions are independent.
18 Limit Theorems [ Sheldon Ross, A first course in probability, 5th ed., 1998 ] The most important results in probability theory are limit theorems. Of these, the most important are... laws of large numbers, concerned with stating conditions under which the average of a sequence of random variables converges (in some sense) to the expected average; central limit theorems, concerned with determining the conditions under which the sum of a large number of random variables has a probability distribution that is approximately normal.
19 Two basic inequalities and the weak law of large numbers 18. Markov s inequality: If X is a random variable that takes only non-negative values, then for any value a > 0, P(X a) E[X] a Chebyshev s inequality: If X is a random variable with finite mean µ and variance σ 2, then for any value k > 0, P( X µ k) σ2 k 2 The weak law of large numbers (Bernoulli; Khintchine): Let X 1,X 2,...,X n be a sequence of independent and identically distributed random variables, each having a finite mean E[X i ] = µ. Then, for any value ǫ > 0, ( X X n P n ) µ ǫ 0 as n
20 19. The central limit theorem for i.i.d. random variables [ Pierre Simon, Marquis de Laplace; Liapunoff in ] Let X 1,X 2,...,X n be a sequence of independent random variables, each having mean µ and variance σ 2. Then the distribution of X X n nµ σ n tends to be the standard normal (Gaussian) as n. That is, for < a <, ( X X n nµ P σ n ) a 1 2π a e x2 /2 dx as n
21 The central limit theorem for independent random variables 20. Let X 1,X 2,...,X n be a sequence of independent and identically distributed random variables having respective means µ i and variances σ 2 i. If (a) the variables X i are uniformly bounded, i.e. for some M R + P( X i < M) = 1 for all i, and (b) i=1 σ2 i =, then P ( n i=1 (X i µ i ) ) n i=1 σ2 i a Φ(a) as n where Φ is the cumulative distribution function for the standard normal (Gaussian) distribution.
22 21. The strong law of large numbers Let X 1,X 2,...,X n be a sequence of independent and identically distributed random variables, each having a finite mean E[X i ] = µ. Then, with probability 1, X X n n µ as n That is, P ( ) lim (X X n )/n = µ n = 1
23 Other inequalities One-sided Chebyshev inequality: If X is a random variable with mean 0 and finite variance σ 2, then for any a > 0, P(X a) σ2 σ 2 +a Corollary: If E[X] = µ, Var(X) = σ 2, then for a > 0 P(X µ+a) σ2 σ 2 +a 2 Chernoff bounds: Let M(t) not = E[e tx ]. Then P(X µ a) σ2 σ 2 +a 2 P(X a) e ta M(t) for all t > 0 P(X a) e ta M(t) for all t < 0
24 Estimation/inference of the parameters of probabilistic models from data (based on [Durbin et al, Biological Sequence Analysis, 1998], p , ) A probabilistic model can be anything from a simple distribution to a complex stochastic grammar with many implicit probability distributions. Once the type of the model is chosen, the parameters have to be inferred from data. We will first consider the case of the categorical distribution, and then we will present the different strategies that can be used in general.
25 A case study: Estimation of the parameters of a categorical distribution from data 24. Assume that the observations for example, when rolling a die about which we don t know whether it is fair or not, or when counting the number of times the amino acid i occurs in a column of a multiple sequence alignment can be expressed as counts n i for each outcome i (i = 1,l...,K), and we want to estimate the probabilities θ i of the underlying distribution. Case 1: When we have plenty of data, it is natural to use the maximum likelihood (ML) solution, i.e. the observed frequency θi ML = n i not. = n i j n j N. Note: it is easy to show that indeed P(n θ ML ) > P(n θ) for any θ θ ML. log P(n θml ) P(n θ) = log Π i(θi ML ) n i Π i θ n i i = i n i log θml i = N θ i i θ ML i log θml i > 0 θ i The inequality follows from the fact that the relative entropy is always positive except when the two distributions are identical.
26 Case 2: When the data is scarce, it is not clear what is the best estimate. In general, we should use prior knowledge, via Bayesian statistics. For instance, one can use the Dirichlet distribution with parameters α. 25. P(θ n) = P(n θ)d(θ α) P(n) It can be shown (see calculus on R. Durbin et. al. BSA book, pag. 320) that the posterior mean estimation (PME) of the parameters is θ PME i def. = θp(θ n)dθ = n i +α i N + j α j The α s are like pseudocounts added to the real counts. (If we think of the α s as extra observations added to the real ones, this is precisely the ML estimate!) This makes the Dirichlet regulariser very intuitive. How to use the pseudocounts: If it is fairly obvious that a certain residue, let s say i, is very common, than we should give it a very high pseudocount α i ; if the residue j is generaly rare, we should give it a low pseudocount.
27 Strategies to be used in the general case A. The Maximum Likelihood (ML) Estimate When we wish to infer the parameters θ = (θ i ) for a model M from a set of data D, the most obvious strategy is to maximise P(D θ,m) over all possible values of θ. Formally: 26. θ ML = argmax θ P(D θ,m) Note: Generally speaking, when we treat P(x y) as a function of x (and y is fixed), we refer to it as a probability. When we treat P(x y) as a function of y (and x is fixed), we call it a likelihood. Note that a likelihood is not a probability distribution or density; it is simply a function of the variable y. A serious drawback of maximum likelihood is that it gives poor results when data is scarce. The solution then is to introduce more prior knowledge, using Bayes theorem. (In the Bayesian framework, the parameters are themselves seen as random variables!)
28 B. The Maximum A posteriori Probability (MAP) Estimate 27. MAP def. θ = argmax θ = argmax θ P(θ D,M) = argmax θ P(D θ,m)p(θ M) P(D θ,m)p(θ M) P(D M) The prior probability P(θ M) has to be chosen in some reasonable manner, and this is the art of Bayesian estimation (although this freedom to choose a prior has made Bayesian statistics controversial at times...). C. The Posterior Mean Estimator (PME) θ PME = θp(θ D,M)dθ where the integral is over all probability vectors, i.e. all those that sum to one. D. Yet another solution is to use the posterior probability P(θ D,M) to sample from it (see [Durbin et al, 1998], section 11.4) and thereby locate regions of high probability for the model parameters.
29 5. Elementary Information Theory Definitions: Let X and Y be discrete random variables. Entropy: H(X) def. = 1 xp(x) log p(x) = x p(x)logp(x) = E p[ logp(x)]. Convention: if p(x) = 0 then we shall consider p(x)logp(x) = 0. Specific Conditional entropy: H(Y X = x) def. = y Y p(y x)logp(y x). Average conditional entropy: H(Y X) def. = imed. x X p(x)h(y X = x) = x X y Y p(x,y)logp(y x). Joint entropy: H(X,Y) def. = x,yp(x,y) logp(x,y)dem. = H(X)+H(Y X) dem. = H(Y)+H(X Y). Information gain (or: Mutual information): 28. IG(X;Y) def. = H(X) H(X Y) imed. = H(Y) H(Y X) imed. = H(X,Y) H(X Y) H(Y X) = IG(Y;X).
30 Exemplification: Entropy of a Bernoulli Distribution 29. H(p) = plog 2 p (1 p)log 2 (1 p) p
31 Basic properties of Entropy, Conditional Entropy, Joint Entropy and Information Gain / Mutual Information 30. ( 1 0 H(p 1,...,p n ) H n,..., 1 ) = logn; n H(X) = 0 iff X is a constant random variable. IG(X;Y) 0; IG(X;Y) = 0 iff X and Y are independent. H(X Y) H(X) H(X Y) = H(X) iff X and Y are independent; H(X,Y) H(X)+H(Y); H(X,Y) = H(X)+H(Y) iff X and Y are independent. a chain rule: H(X 1,...,X n ) = H(X 1 )+H(X 2 X 1 )+...+H(X n X 1,...,X n 1 ).
32 The Relationship between Entropy, Conditional Entropy, Joint Entropy and 31. Mutual Information H(X,Y) H(X Y) IG(X,Y) H(Y X) H(X) H(Y)
33 Other definitions 32. Let X be a discrete random variable, p its pmf and q another pmf (usually a model of p). Cross-entropy: CH(X,q) = x X p(x)logq(x) = E p [log ] 1 q(x) Let X and Y be discrete random variables, and p and q their respective pmf s. Relative entropy (or, Kullback-Leibler divergence): KL(p q) = x X p(x)log q(x) p(x) = E p = CH(X,q) H(X). [ log p(x) q(x) ]
34 33. The Relationship between Entropy, Conditional Entropy, Joint Entropy, Information Gain, Cross-Entropy and Relative Entropy (or KL divergence) CH(X, p ) H(X,Y) CH(Y, p ) Y X KL(p p ) X Y H(X Y) H(Y X) KL(p p ) Y X H(p ) X IG(X,Y) = KL(p p p ) XY X Y H(p ) Y
35 CH(X,q) 0 Basic properties of cross-entropy and relative entropy KL(p q) 0 for all p and q; KL(p q) = 0 iff p and q are identical. [Consequence:] If X is a discrete random variable, p its pmf, and q another pmf, then CH(X,q) H(X) 0. The first of these two inequations is also known as Gibbs inequation: n i=1 p ilog 2 p i i p ilog 2 q i. Unlike H of a discrete n-ary variable, which is bounded by log 2 n, there is no (general) upper bound for CH. (However, KL is upper-bounded.) Unlike H(X,Y), which is symmetric in its arguments, CH and KL are not! Therefore KL is NOT a distance metric! (See the next slide.) IG(X;Y) = KL(p XY p X p Y ) = ( ) p(x)p(y) x y p(x,y)log. p(x, y) 34.
36 Remark 35. The quantity VI(X,Y) def = H(X,Y) IG(X;Y) = H(X)+H(Y) 2IG(X;Y) = H(X Y)+H(Y X) known as variation of information, is a distance metric, i.e. it is nonengative, symmetric, implies indiscernability, and satisfies the triangle inequality. Consider M(p,q) = 1 2 (p+q). The function JSD(p q) = 1 2 KL(p M)+ 1 KL(q M) is called the Jensen- 2 Shannon divergence. One can prove that JSD(p q) defines a distance metric (the Jensen- Shannon distance).
37 Recommended Exercises From [Manning & Schütze, 2002, ch. 2:] Examples 1, 2, 4, 5, 7, 8, 9 Exercises 2.1, 2.3, 2.4, 2.5 From [Sheldon Ross, 1998, ch. 8:] Examples 2a, 2b, 3a, 3b, 3c, 5a, 5b
38 Addenda: Other Examples of Probabilistic Distributions 37.
39 Multinomial distribution: generalises the binomial distribution to the case where there are K independent outcomes with probabilities θ i, i = 1,...,K such that K i=1 θ i = 1. The probability of getting n i occurrence of outcome i is given by P(n θ) = n! Π K i=1 (n i!) ΠK i=1θ n i i, 38. where n = n n K, and θ = (θ 1,...,θ K ). Note: The particular case n = 1 represents the categorical distribution. This is a generalisation of the Bernoulli distribution. Example: The outcome of rolling a die n times is described by a categorical distribution. The probabilities of each of the 6 outcomes are θ 1,...,θ 6. For a fair die, θ 1 =... = θ 6, and the probability of rolling it 12 times and getting each outcome twice is: 12! (2!) 6 ( ) 12 1 =
40 Poisson distribution (or, Poisson law of small numbers): 39. p(k;λ) = λk k! e λ, with k N and parameter λ > 0. Mean = variance = λ. Poisson probability mass function Poisson cumulative distribution function P(X=k) λ = 1 λ = 4 λ = 10 P(X k) λ = 1 λ = 4 λ = k k
41 40. Exponential distribution (a.k.a. negative exponential distribution): p(x;λ) = λe λx for x 0 and parameter λ > 0. Mean = λ 1, variance = λ 2. Exponential probability density function Exponential cumulative distribution function p(x) λ = 0.5 λ = 1 λ = 1.5 P(X x) λ = 0.5 λ = 1 λ = x x
42 Gamma distribution: p(x;k,θ) = x k 1 e x/θ Mean = kθ, variance = kθ 2. for x 0 and parameters k > 0 (shape) and θ > 0 (scale). Γ(k)θk The gamma function is a generalisation of the factorial function to real values. For any positive real number x, Γ(x+1) = xγ(x). (Thus, for integers Γ(n) = (n 1)!.) 41. Gamma probability density function Gamma cumulative distribution function p(x) k = 1.0, θ = 2.0 k = 2.0, θ = 2.0 k = 3.0, θ = 2.0 k = 5.0, θ = 1.0 k = 9.0, θ = 0.5 k = 7.5, θ = 1.0 k = 0.5, θ = 1.0 P(X x) k = 1.0, θ = 2.0 k = 2.0, θ = 2.0 k = 3.0, θ = 2.0 k = 5.0, θ = 1.0 k = 9.0, θ = 0.5 k = 7.5, θ = 1.0 k = 0.5, θ = x x
43 χ 2 distribution: p(x;ν) = 1 Γ(ν/2) ( 1 2 ) ν/2 x ν/2 1 e 1 2 x for x 0 and ν a positive integer. It is obtained from Gamma distribution by taking k = ν/2 and θ = 2. Mean = ν, variance = 2ν. 42. Chi Squared probability density function Chi Squared cumulative distribution function p(x) k = 1 k = 2 k = 3 k = 4 k = 6 k = 9 P(X x) k = 1 k = 2 k = 3 k = 4 k = 6 k = x x
44 Laplace distribution: p(x;µ,θ) = 1 2θ e µ x θ. Mean = µ, variance = 2θ Laplace probability density function Laplace cumulative density function p(x) µ = 0, θ = 1 µ = 0, θ = 2 µ = 0, θ = 4 µ = 5, θ = 4 P(X x) µ = 0, θ = 1 µ = 0, θ = 2 µ = 0, θ = 4 µ = 5, θ = x x
45 Student s distribution: ( ) ν +1 Γ 2 p(x;ν) = ( ν (1+ νπ Γ 2) x2 ν ) ν +1 2 for x R and ν > 0 (the degree of freedom param.) 44. Mean = 0 for ν > 1, otherwise undefined. Variance = ν for ν > 2, for 1 < ν 2, otherwise undefined. ν 2 The probability density function and the cumulative distribution function: Note [from Wiki]: The t-distribution is symmetric and bell-shaped, like the normal distribution, but it has havier tails, meaning that it is more prone to producing values that fall far from its mean.
46 45. Dirichlet distribution: D(θ α) = 1 Z(α) ΠK i=1 θα i 1 i δ( K i=1 θ i 1) where α = α 1,...,α K with α i > 0 are the parameters, θ i satisfy 0 θ i 1 and sum to 1, this being indicated by the delta function term δ( i θ i 1), and the normalising factor can be expressed in terms of the gamma function: Z(α) = Π K i=1 θα i 1 i δ( i 1)dθ = Π iγ(α i ) Γ( i α i) α i Mean of θ i : j α. j For K = 2, the Dirichlet distribution reduces to the more widely known beta distribution, and the normalising constant is the beta function.
47 46. Remark: Concerning the multinomial and Dirichlet distributions: The algebraic expression for the parameters θ i is similar in the two distributions. However, the multinomial is a distribution over its exponents n i, whereas the Dirichlet is a distribution over the numbers θ i that are exponentiated. The two distributions are said to be conjugate distributions and their close formal relationship leads to a harmonious interplay in many estimation problems. Similarly, the beta distribution is the conjugate of the Bernoulli distribution, and the gamma distribution is the conjugate of the Poisson distribution.
Basic Statistics and Probability Theory
0. Basic Statistics and Probability Theory Based on Foundations of Statistical NLP C. Manning & H. Schütze, ch. 2, MIT Press, 2002 Probability theory is nothing but common sense reduced to calculation.
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions
More informationIntroduction to Machine Learning
Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB
More informationIntroduction to Machine Learning
What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationQuick Tour of Basic Probability Theory and Linear Algebra
Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More information1 Presessional Probability
1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional
More informationAlgorithms for Uncertainty Quantification
Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example
More informationAppendix A : Introduction to Probability and stochastic processes
A-1 Mathematical methods in communication July 5th, 2009 Appendix A : Introduction to Probability and stochastic processes Lecturer: Haim Permuter Scribe: Shai Shapira and Uri Livnat The probability of
More informationRandom Variables and Their Distributions
Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital
More informationPCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities
PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationPart IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationProbability and Distributions
Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationProbability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014
Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions
More informationDiscrete Probability Refresher
ECE 1502 Information Theory Discrete Probability Refresher F. R. Kschischang Dept. of Electrical and Computer Engineering University of Toronto January 13, 1999 revised January 11, 2006 Probability theory
More informationFormulas for probability theory and linear models SF2941
Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms
More informationCS 591, Lecture 2 Data Analytics: Theory and Applications Boston University
CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.
More informationCME 106: Review Probability theory
: Probability theory Sven Schmit April 3, 2015 1 Overview In the first half of the course, we covered topics from probability theory. The difference between statistics and probability theory is the following:
More informationProbability Review. Chao Lan
Probability Review Chao Lan Let s start with a single random variable Random Experiment A random experiment has three elements 1. sample space Ω: set of all possible outcomes e.g.,ω={1,2,3,4,5,6} 2. event
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More information1 Probability theory. 2 Random variables and probability theory.
Probability theory Here we summarize some of the probability theory we need. If this is totally unfamiliar to you, you should look at one of the sources given in the readings. In essence, for the major
More informationReview of Probabilities and Basic Statistics
Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to
More informationEE514A Information Theory I Fall 2013
EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/
More information1 Review of Probability and Distributions
Random variables. A numerically valued function X of an outcome ω from a sample space Ω X : Ω R : ω X(ω) is called a random variable (r.v.), and usually determined by an experiment. We conventionally denote
More informationFundamental Tools - Probability Theory II
Fundamental Tools - Probability Theory II MSc Financial Mathematics The University of Warwick September 29, 2015 MSc Financial Mathematics Fundamental Tools - Probability Theory II 1 / 22 Measurable random
More informationWeek 2. Review of Probability, Random Variables and Univariate Distributions
Week 2 Review of Probability, Random Variables and Univariate Distributions Probability Probability Probability Motivation What use is Probability Theory? Probability models Basis for statistical inference
More information7 Random samples and sampling distributions
7 Random samples and sampling distributions 7.1 Introduction - random samples We will use the term experiment in a very general way to refer to some process, procedure or natural phenomena that produces
More informationPart IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationLecture 1: Review on Probability and Statistics
STAT 516: Stochastic Modeling of Scientific Data Autumn 2018 Instructor: Yen-Chi Chen Lecture 1: Review on Probability and Statistics These notes are partially based on those of Mathias Drton. 1.1 Motivating
More informationProbability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27
Probability Review Yutian Li Stanford University January 18, 2018 Yutian Li (Stanford University) Probability Review January 18, 2018 1 / 27 Outline 1 Elements of probability 2 Random variables 3 Multiple
More informationContinuous Random Variables and Continuous Distributions
Continuous Random Variables and Continuous Distributions Continuous Random Variables and Continuous Distributions Expectation & Variance of Continuous Random Variables ( 5.2) The Uniform Random Variable
More informationChapter 1. Sets and probability. 1.3 Probability space
Random processes - Chapter 1. Sets and probability 1 Random processes Chapter 1. Sets and probability 1.3 Probability space 1.3 Probability space Random processes - Chapter 1. Sets and probability 2 Probability
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationChapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University
Chapter 3, 4 Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University ENCS6161 p.1/47 The Notion of a Random Variable A random variable X is a function that assigns a real
More informationRecitation 2: Probability
Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions
More information3 Multiple Discrete Random Variables
3 Multiple Discrete Random Variables 3.1 Joint densities Suppose we have a probability space (Ω, F,P) and now we have two discrete random variables X and Y on it. They have probability mass functions f
More informationMath 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14
Math 325 Intro. Probability & Statistics Summer Homework 5: Due 7/3/. Let X and Y be continuous random variables with joint/marginal p.d.f. s f(x, y) 2, x y, f (x) 2( x), x, f 2 (y) 2y, y. Find the conditional
More information1 Random Variable: Topics
Note: Handouts DO NOT replace the book. In most cases, they only provide a guideline on topics and an intuitive feel. 1 Random Variable: Topics Chap 2, 2.1-2.4 and Chap 3, 3.1-3.3 What is a random variable?
More informationAarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell
Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework
More informationExamples of random experiment (a) Random experiment BASIC EXPERIMENT
Random experiment A random experiment is a process leading to an uncertain outcome, before the experiment is run We usually assume that the experiment can be repeated indefinitely under essentially the
More informationReview of Probability Theory
Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving
More information3. Probability and Statistics
FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important
More informationExercises with solutions (Set D)
Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where
More informationInformation Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18
Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable
More informationDistributions of Functions of Random Variables. 5.1 Functions of One Random Variable
Distributions of Functions of Random Variables 5.1 Functions of One Random Variable 5.2 Transformations of Two Random Variables 5.3 Several Random Variables 5.4 The Moment-Generating Function Technique
More informationStatistical Machine Learning Lectures 4: Variational Bayes
1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference
More informationArtificial Intelligence
Artificial Intelligence Probabilities Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: AI systems need to reason about what they know, or not know. Uncertainty may have so many sources:
More informationSample Spaces, Random Variables
Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational
More informationProbability Background
Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second
More informationCourse: ESO-209 Home Work: 1 Instructor: Debasis Kundu
Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear
More informationRandom Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R
In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample
More informationContinuous Random Variables
1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Lecture 1: Introduction to Probabilistic Modelling Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Why a
More informationTopic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.
Topic 2: Probability & Distributions ECO220Y5Y: Quantitative Methods in Economics Dr. Nick Zammit University of Toronto Department of Economics Room KN3272 n.zammit utoronto.ca November 21, 2017 Dr. Nick
More information01 Probability Theory and Statistics Review
NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement
More informationRandom Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.
Random Variables Amappingthattransformstheeventstotherealline. Example 1. Toss a fair coin. Define a random variable X where X is 1 if head appears and X is if tail appears. P (X =)=1/2 P (X =1)=1/2 Example
More informationWeek 12-13: Discrete Probability
Week 12-13: Discrete Probability November 21, 2018 1 Probability Space There are many problems about chances or possibilities, called probability in mathematics. When we roll two dice there are possible
More informationWhy study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables
ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section
More informationECE 4400:693 - Information Theory
ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential
More informationClassification & Information Theory Lecture #8
Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing
More informationMachine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014
Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,
More informationSTAT2201. Analysis of Engineering & Scientific Data. Unit 3
STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationPerhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.
Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage
More informationLecture 1: Probability Fundamentals
Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability
More informationStat410 Probability and Statistics II (F16)
Stat4 Probability and Statistics II (F6 Exponential, Poisson and Gamma Suppose on average every /λ hours, a Stochastic train arrives at the Random station. Further we assume the waiting time between two
More informationIntro to Probability. Andrei Barbu
Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems
More informationOverview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian
More informationCSE 312 Final Review: Section AA
CSE 312 TAs December 8, 2011 General Information General Information Comprehensive Midterm General Information Comprehensive Midterm Heavily weighted toward material after the midterm Pre-Midterm Material
More informationBayesian statistics, simulation and software
Module 1: Course intro and probability brush-up Department of Mathematical Sciences Aalborg University 1/22 Bayesian Statistics, Simulations and Software Course outline Course consists of 12 half-days
More informationLecture 4: Probability and Discrete Random Variables
Error Correcting Codes: Combinatorics, Algorithms and Applications (Fall 2007) Lecture 4: Probability and Discrete Random Variables Wednesday, January 21, 2009 Lecturer: Atri Rudra Scribe: Anonymous 1
More informationChapter 3: Random Variables 1
Chapter 3: Random Variables 1 Yunghsiang S. Han Graduate Institute of Communication Engineering, National Taipei University Taiwan E-mail: yshan@mail.ntpu.edu.tw 1 Modified from the lecture notes by Prof.
More informationCS 630 Basic Probability and Information Theory. Tim Campbell
CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)
More informationDept. of Linguistics, Indiana University Fall 2015
L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 34 To start out the course, we need to know something about statistics and This is only an introduction; for a fuller understanding, you would
More informationLecture 3. Probability - Part 2. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. October 19, 2016
Lecture 3 Probability - Part 2 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 19, 2016 Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 1 / 46 Outline 1 Common Continuous
More informationStatistics for scientists and engineers
Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3
More informationToday. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion
Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence
More informationNorthwestern University Department of Electrical Engineering and Computer Science
Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability
More informationSUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)
SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems
More informationProbability. Computer Science Tripos, Part IA. R.J. Gibbens. Computer Laboratory University of Cambridge. Easter Term 2008/9
Probability Computer Science Tripos, Part IA R.J. Gibbens Computer Laboratory University of Cambridge Easter Term 2008/9 Last revision: 2009-05-06/r-36 1 Outline Elementary probability theory (2 lectures)
More informationBivariate distributions
Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient
More informationPMR Introduction. Probabilistic Modelling and Reasoning. Amos Storkey. School of Informatics, University of Edinburgh
PMR Introduction Probabilistic Modelling and Reasoning Amos Storkey School of Informatics, University of Edinburgh Amos Storkey PMR Introduction School of Informatics, University of Edinburgh 1/50 Outline
More informationEE4601 Communication Systems
EE4601 Communication Systems Week 2 Review of Probability, Important Distributions 0 c 2011, Georgia Institute of Technology (lect2 1) Conditional Probability Consider a sample space that consists of two
More informationBASICS OF PROBABILITY
October 10, 2018 BASICS OF PROBABILITY Randomness, sample space and probability Probability is concerned with random experiments. That is, an experiment, the outcome of which cannot be predicted with certainty,
More informationTopic 3: The Expectation of a Random Variable
Topic 3: The Expectation of a Random Variable Course 003, 2017 Page 0 Expectation of a discrete random variable Definition (Expectation of a discrete r.v.): The expected value (also called the expectation
More information2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).
Name M362K Final Exam Instructions: Show all of your work. You do not have to simplify your answers. No calculators allowed. There is a table of formulae on the last page. 1. Suppose X 1,..., X 1 are independent
More informationChapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye
Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality
More informationAdvanced Probabilistic Modeling in R Day 1
Advanced Probabilistic Modeling in R Day 1 Roger Levy University of California, San Diego July 20, 2015 1/24 Today s content Quick review of probability: axioms, joint & conditional probabilities, Bayes
More informationLectures for APM 541: Stochastic Modeling in Biology. Jay Taylor
Lectures for APM 541: Stochastic Modeling in Biology Jay Taylor November 3, 2011 Contents 1 Distributions, Expectations, and Random Variables 4 1.1 Probability Spaces...................................
More informationLecture 11. Probability Theory: an Overveiw
Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the
More informationProbability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.
Probability Theory To start out the course, we need to know something about statistics and probability Introduction to Probability Theory L645 Advanced NLP Autumn 2009 This is only an introduction; for
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More information