Extract. Data Analysis Tools
|
|
- Maryann Bishop
- 6 years ago
- Views:
Transcription
1 Extract Data Analysis Tools Harjoat S. Bhamra July 8, 2017
2 Contents 1 Introduction 7 I Probability 13 2 Inequalities Jensen s inequality Arithmetic Mean-Geometric Mean Inequality Proof of Corollary Cauchy-Schwarz Inequality Multiple random variables Information entropy Exercises Weak Law of Large Numbers Markov Inequality Chebyshev inequality Weak law of large numbers Exercises Normal Distribution Calculations with the normal distribution Checking the normal density integrates to one Mode, median, sample mean and variance Bounds on tail probability of a normal distribution Multivariate normal Bivariate normal Brownian motion Exercises
3 4 CONTENTS 5 Generating and Characteristic Functions Moment Generating Functions Characteristic Functions A Brownian interlude with mirrors, but no smoke Exercises Central limit theorem Exercises II Statistics 65 7 Mathematical Statistics Introduction Mean squared error Bayesian inference 71 9 Estimating stochastic volatility models 73 III Applying Linear Algebra and Statistics to Data Analysis Principal Components Analysis (PCA) Overview A simple example from high school physics Linear Algebra for Principal Components Analysis Vector spaces Linear independence, subspaces, spanning and bases Change of basis How do we choose the basis? Noisy data Redundancy Covariance matrix Covariance matrix under a new basis PCA via Projection Mathematics of Orthogonal Projections Orthogonal projection operators Projecting the data onto a 1-d subspace Spectral Theorem for Real Symmetric Matrices Projecting the data onto an m-dimensional subspace Scree Plots
4 CONTENTS What about changing the basis? Orthogonal matrices Statistical Inference Exercises IV Exam Preparation Topics covered in Final Exam Linear Maps
5 Chapter 1 Introduction There are of course many reasons for learning mathematics. Some take the view of Simeon Poisson, to whom the following saying is attributed: Life is good for only two things: doing mathematics and teaching it. However, the type of person who takes Simeon Poisson s view of life probably does not need to read these notes. So what is their purpose? 7
6 8 CHAPTER 1. INTRODUCTION Simeon Poisson ( ) was a French mathematician. Poisson s name is attached to a wide variety of ideas in mathematics and physics, for example: Poisson s integral, Poisson s equation in potential theory, Poisson brackets in differential equations, Poisson s ratio in elasticity, and Poisson s constant in electricity.
7 9 Benjamin Disraeli ( ) was a British politician. Disraeli trained as a lawyer but wanted to be a novelist. He became a politician in the 1830 s and is generally acknowledged to be one of the key figures in the history of the Conservative Party. He was Prime Minister in 1868 and from 1874 to He famously acquired for Britain a large stake in the Suez Canal, and made Queen Victoria Empress of India. In 1876 he was raised to the peerage as the Earl of Beaconsfield. In these lecture notes, I hope to include sufficient mathematics for you to be able to be analyze data, without delving too deeply into advanced statistics or econometrics, but while covering enough material to ensure that you and your work are neither a danger to yourself, nor to others. As Benjamin Disraeli famously said: There are three types of lies lies, damn lies, and statistics. Hopefully, after this course, no one will say that about your work.
8 10 CHAPTER 1. INTRODUCTION We shall cover some basic probability theory and linear algebra, before delving into some elementary statistics, culminating with a study of principal components analysis. Principal components analysis (PCA) is one of a family of techniques for taking a large amount of data and summarizing it via a smaller more set of variables. You can think of it as the process of replacing a long book via a summary. More formally, PCA is the process of taking high-dimensional data and using the dependencies between the variables to represent it in a more tractable, lower-dimensional form, without losing too much information. A nice example of PCA is in politics. Stephen A. Weis, now a software engineer at Facebook, analyzed the voting records of senators in the first session of the 108th US Congress. He looked at 458 separate votes taken by the 100 senators. Each vote was described by a 1 (yes), 1 (no) or 0 (absent). In total, this gave rise to a 100 (number of senators) 458 (number of votes) sized matrix. Using PCA (also known as the singular value decomposition, SVD), Weis was able to reduce the dimensionality of the data to the extent that he could summarize it on a 2-dimensional plot depicted in Figure 1. If you were kind, you might describe these lecture notes as the result of PCA applied to the mathematics of data analysis.
9 Democratic and Republican senators have been colored blue and red, respectively. The values of the axes and the axes themselves are artifacts of the singular value decomposition. In other words, the axes don t mean anything they are simply the two most significant dimensions of the data s optimal representation. Regardless, one can see that this map clearly clusters senators according to party. Note that this map was generated only from voting records, without any data on party affiliation. From just the voting data, there is clearly a partisan divide between the parties. The above example illustrates the strengths and limitations of PCA. It allows you summarize large data sets via small set of variables in way which makes it easy to visualize the data. But it does not tell you what the summary variables mean. Indeed, as was said by Henry Clay: Statistics are no substitute for judgment. 11
10 12 CHAPTER 1. INTRODUCTION Henry Clay ( ) was an American lawyer and planter, politician, and skilled orator who represented Kentucky in both the United States Senate and House of Representatives. He served three non-consecutive terms as Speaker of the House of Representatives and served as Secretary of State under President John Quincy Adams from 1825 to Clay ran for the presidency in 1824, 1832 and 1844, while also seeking the Whig Party nomination in 1840 and 1848.
11 Part I Probability 13
12 Chapter 2 Inequalities Contents 2.1 Jensen s inequality Arithmetic Mean-Geometric Mean Inequality Proof of Corollary Cauchy-Schwarz Inequality Multiple random variables Information entropy Exercises We often model data via random variables. For example, stock returns are often assumed to be normal. Inequalities give us well-defined facts about random variables. Definition 1 A function f : (a, b) R is concave if for all x, y (a, b) and λ [0, 1] λf(x) + (1 λ)f(y) f(λx + (1 λ)y) It is strictly concave if strict inequality holds when x y and 0 < λ < 1. Definition 2 A function f is convex (strictly convex) if -f is concave (strictly concave). Fact If f is a twice differentiable function and f (x) 0 for all x (a, b) then f is concave [a basic exercise in Analysis]. It is strictly concave if f (x) > 0 for all x (a, b). 15
13 16 CHAPTER 2. INEQUALITIES The chord lies below the function. Johan Jensen ( ) He was a Danish mathematician and engineer. Although he studied mathematics among various subjects at college, and even published a research paper in mathematics, he learned advanced math topics later by himself and never held any academic position. Instead, he was a successful engineer for the Copenhagen Telephone Company between 1881 and 1924, and became head of the technical department in All his mathematics research was carried out in his spare time. Jensen is mostly renowned for his famous inequality, Jensen s inequality. In 1915, Jensen also proved Jensen s formula in complex analysis.
14 2.1. JENSEN S INEQUALITY Jensen s inequality Theorem 1 (Jensen s Inequality) Let f : (a, b) R be a concave function. Then ( N ) f p n x n p n f(x n ) (2.1) for all x 1,..., x N (a, b) and p 1,..., p N (0, 1) such that N p n = 1. Furthermore if f is strictly concave then equality holds iff all the x n are equal. If X is a random variable that takes finitely many values, Jensen s Inequality can be written as f(e[x]) E[f(X)]. (2.2) Proof of Theorem 1 We use proof by induction. Jensen s Inequality for N = 2 is just the definition of concavity. Suppose it holds for N 1. Now consider ( N ) ( ) f p n x n = f p 1 x 1 + p n x n (2.3) To apply the definition of concavity, we observe that ( N ) p n f p 1 x 1 + p k N n=2 p x n = f (p 1 x 1 + (1 p 1 ) z), (2.4) k where n=2 n=2 z = n=2 Applying the definition of concavity, we have n=2 p n N n=2 p k x n. (2.5) f (p 1 x 1 + (1 p 1 ) z) p 1 f(x 1 ) + (1 p 1 )f(z) (2.6) ( N ) p n = p 1 f(x 1 ) + (1 p 1 )f N n=2 p x n (2.7) k Jensen s Inequality holds for N 1 and so ( N ) p n p n f N n=2 p x n N k n=2 p f(x n ) (2.8) k n=2 n=2 n=2
15 18 CHAPTER 2. INEQUALITIES Therefore ( N ) f p n x n x 1 f(x 1 ) + = N p k n=2 n=2 p n N n=2 p k f(x n ) (2.9) x n f(x n ) (2.10) Therefore, if Jensen s Inequality holds for N 1, it also holds for N by virtue of concavity. Since, Jensen s Inequality for N = 2 is just the definition of concavity it follows by induction that Jensen s Inequality holds for all finite integers N greater than or equal to Arithmetic Mean-Geometric Mean Inequality Corollary 1 (Arithmetic Mean-Geometric Mean Inequality) Given positive real numbers x 1,..., x N, ( N ) 1 N x n 1 x n (2.11) N Proof of Corollary 1 Suppose X is a discrete random variable such that Pr(X = x n ) = 1 N, x n > 0, n {1,..., N} (2.12) Observe that ln x is a concave function of x and so from Jensen s Inequality E[ln X] ln E[X] (2.13) Therefore ( ) 1 1 ln x n ln x n N N ( N ) ( ) 1 N ln 1 x n ln x n N ( N ) 1 ( ) N ln x n 1 ln x n N (2.14) (2.15) (2.16)
16 2.2. CAUCHY-SCHWARZ INEQUALITY 19 Now because e x is monotonically increaasing, we have ( N x n ) 1 N 1 N x n (2.17) 2.2 Cauchy-Schwarz Inequality Theorem 2 (Cauchy-Schwarz Inequality) For any random variables X and Y E[XY ] 2 E[X 2 ]E[Y 2 ]. (2.18) Proof of Theorem 2 If Y = 0, then both sides are 0. Otherwise, E[Y 2 ] > 0. Let w = X Y E[XY ] E[Y 2 ]. (2.19) Then [ E[w 2 ] = E X 2 2XY E[XY ] E[Y 2 + Y ] ] 2 (E[XY ])2 (E[Y 2 ]) 2 = E[X 2 (E[XY ])2 (E[XY ])2 ] 2 E[Y 2 + ] E[Y 2 ] = E[X 2 ] (E[XY ])2 E[Y 2 ] (2.20) (2.21) (2.22) Since E[w 2 ] 0, the Cauchy-Schwarz inequality follows Multiple random variables If we have two random variables, we can study the relationship between them. Definition 3 (Covariance) Given two random variables X, Y, the covariance is Cov(X, Y ) = E[(X E[X])(Y E[Y ])]. Proposition 1 1. Cov(X, c) = 0 for constant c.
17 20 CHAPTER 2. INEQUALITIES Visualizing the Cauchy-Schwarz Inequality 2. Cov(X + c, Y ) = Cov(X, Y ). 3. Cov(X, Y ) = Cov(Y, X). 4. Cov(X, Y ) = E[XY ] E[X]E[Y ]. 5. Cov(X, X) = V ar(x). 6. V ar(x + Y ) = V ar(x) + V ar(y ) + 2Cov(X, Y ). 7. If X, Y are independent, Cov(X, Y ) = 0. These are all trivial to prove. It is important to note that Cov(X, Y ) = 0 does not imply X and Y are independent. Example 1
18 2.3. INFORMATION ENTROPY 21 Let (X, Y ) = (2, 0), ( 1, 1) or ( 1, 1) with equal probabilities of 1/3. These are not independent since Y = 0 X = 2. However, Cov(X, Y ) = E[XY ] E[X]E[Y ] = = 0. If we randomly pick a point on the unit circle, and let the coordinates be (X, Y ), then E[X] = E[Y ] = E[XY ] = 0 by symmetry. So Cov(X, Y ) = 0 but X and Y are clearly not independent (they have to satisfy x 2 + y 2 = 1). The covariance is not that useful in measuring how well two variables correlate. For one, the covariance can (potentially) have dimensions, which means that the numerical value of the covariance can depend on what units we are using. Also, the magnitude of the covariance depends largely on the variance of X and Y themselves. To solve these problems, we define Definition 4 (Correlation coefficient) The correlation coefficient of X and Y is ρ(x, Y ) = Cov(X, Y ) V ar(x)v ar(y ). Proposition 2 ρ(x, Y ) 1. Proof of 2 Apply Cauchy-Schwarz to X E[X] and Y E[Y ]. Again, zero correlation does not necessarily imply independence. 2.3 Information entropy Suppose we observe data about the economy up until 2010 and then look again in How much more information do we have? Information theory gives us ways of measuring information. We shall start (and end!) with the basic idea of information entropy, also known as Shannon s entropy. In the context of PCA, we want to reduce the dimensionality of a dataset, but without losing too much information. Entropy gives us a way of measuring this.
19 22 CHAPTER 2. INEQUALITIES Claude Shannon ( ). He introduced the notion that information could be quantified. In A Mathematical Theory of Communication, his legendary paper from 1948, Shannon proposed that data should be measured in bits discrete values of zero or one. Shannon developed information entropy as a measure for the uncertainty in a message while essentially inventing the field of information theory. Perhaps confusingly, in information theory, the term entropy refers to information we don t have (normally people define information as what they know!). The information we don t have about a system, its entropy, is related to its unpredictability: how much it can surprise us. Suppose an event A occurs with probability P (A) = p. How surprising is it? If is not very suprising, there cannot be much new information in the event. Let s try to invent a surprise function, say S(p). What properties should this have? Since a certain event is unsurprising we would like S(1) = 0. We should also like S(p) to be decreasing and continuous in p. If A and B are independent events then we should like S(P (A B)) = S(P (A)) + S(P (B)). It turns out that the only function with these properties is one of the form S(p) = c log a p, (2.23) with c > 0. For simplicity, take c = 1. The log can be any base, but for the time being let us use base 2 (a = 2).
20 2.3. INFORMATION ENTROPY 23 If X is a random variable that takes values 1,..., N with probabilities p 1,..., p N, then on average the surprise obtained on learning X is H(X) = ES(p x ) = p n log 2 p n (2.24) This is the information entropy of X. It is an important quantity in information theory. The log can be taken to any base, but using base 2, nh(x) is roughly the expected number of binary bits required to report the result of n experiments in which X 1,..., X N are i.i.d. observations from distribution (p n, 1 n N) and we encode our reporting of the results of experiments in the most efficient way. Let s use Jensens inequality to prove the entropy is maximized by p 1 =... = p N = 1/N. Consider f(x) = log x, which is a convex function. We may assume p n > 0 for all n. Let X be a r.v. such that X = 1/p n with probability p n. Then 1 p n log p n = Ef(X) f(e[x]) = f(n) = log N = N log 1 N (2.25) To provide some more underpinnings for ideas from information theory, we shall make two definitions. Definition 5 If X is a random variable that takes values x 1,..., x N with probabilities p 1,..., p N, then the Shannon information context of an outcome x n is defined as h(x n ) = log 2 1 p n. (2.26) Information content is measured in bits. One bit is typically defined as the uncertainty of a binary random variable that is 0 or 1 with equal probability, or the information that is gained when the value of such a variable becomes known. Definition 6 If X is a random variable that takes values x 1,..., x N with probabilities p 1,..., p N, then the information entropy of the random variable is given by the mean Shannon information content H(X) = p n log 2 p n (2.27) Note that the entropy does not depend on the values that the random variable takes, but only depends on the probability distribution. We can also define the joint entropy of a family of random variables.
21 24 CHAPTER 2. INEQUALITIES Definition 7 Consider a family of discrete random variables, X 1,..., X N, where X i takes a finite set of values in some set A, which wlog is a subset of N. Their joint entropy is defined by H(X 1,..., X n ) = x 1 A 1... x N A N Pr((X 1,..., X N ) = (x 1,..., x N )) log 2 Pr((X 1,..., X N ) = (x 1,..., x N )) Example 2 Suppose X 1 and X 2 take the following values (2.28) Pr(X 1 = 1, X 2 = 1) = 1/4 (2.29) Pr(X 1 = 1, X 2 = 1) = 1/4 (2.30) Pr(X 1 = 1, X 2 = 1) = 1/4 (2.31) Pr(X 1 = 1, X 2 = 1) = 1/4 (2.32) Clearly X 1 and X 2 are independent. The joint entropy of X 1 and X 2 is We can deduce that Observe that and so we see that 1 4 log log log log (2.33) = log 2 4 = 2 (2.34) Pr(X 1 = 1) = Pr(X 1 = 1) = 1/2 = Pr(X 2 = 1) = Pr(X 2 = 1). (2.35) H(X 1 ) = H(X 2 ) = 1 2 log log 2 2 = 1 (2.36) H(X 1, X 2 ) = H(X 1 ) + H(X 2 ) = 2. (2.37) Suppose X 1 and X 2 are correlated and take the following values Pr(X 1 = 1, X 2 = 1) = 1/6 (2.38) Pr(X 1 = 1, X 2 = 1) = 1/3 (2.39) Pr(X 1 = 1, X 2 = 1) = 1/3 (2.40) Pr(X 1 = 1, X 2 = 1) = 1/6 (2.41)
22 2.3. INFORMATION ENTROPY 25 The joint entropy of X 1 and X 2 is and so We can easily deduce that But, now 1 6 log log log log = 1 3 log log (2.42) (2.43) = 1 3 log log 2 3 (2.44) = log 2 6 1/3 3 2/3 = < 2 (2.45) Pr(X 1 = 1) = Pr(X 1 = 1) = 1/2 = Pr(X 2 = 1) = Pr(X 2 = 1), (2.46) H(X 1 ) = H(X 2 ) = 1 (2.47) = H(X 1, X 2 ) < H(X 1 ) + H(X 2 ) = 2. (2.48) This result is intuitive. When the random variables are correlated, their joint information is less than their sum. You may well have seen the following definition of independence for discrete random variables. Definition 8 Consider two discrete random variables, X and Y, which can take values on the set {a 1,..., a N }. X and Y are independent if i, j, {1,..., N}, Pr({X = a i, Y = a j }) = Pr(X = a i ) Pr(Y = a j ). (2.49) Using the above definition we can prove that the joint entropy of two independent random variables is just the sum of the individual entropies. Proposition 3 For two discrete random variables, X and Y, if and only if X and Y are independent. H(X, Y ) = H(X) + H(Y ) (2.50)
23 26 CHAPTER 2. INEQUALITIES Proof of Proposition 3 From Definition 7, we have H(X, Y ) = i=1 j=1 Pr({X = a i, Y = a j }) log 1 Pr({X = a i, Y = a j }). (2.51) Supposing X and Y are independent, we obtain H(X, Y ) = i=1 j=1 Pr({X = a i, Y = a j }) log 1 N Pr(X = a i ) + i=1 j=1 Pr({X = a i, Y = a j }) log (2.52) 1 Pr(Y = a j ). Hence H(X, Y ) = 1 log Pr(X = a i ) i=1 Pr({X = a i, Y = a j }) + j=1 1 log Pr(Y = a j ) j=1 Pr({X = a i, Y = a j }). i=1 (2.53) Observe that Pr({X = a i, Y = a j }) = Pr(X = a i ) (2.54) j=1 and Pr({X = a i, Y = a j }) = Pr(Y = a j ). (2.55) i=1 Hence H(X, Y ) = 1 log Pr(X = a i ) Pr(X = a i) + i=1 log j=1 1 Pr(Y = a j ) Pr(Y = a j) (2.56) = H(X) + H(Y ). (2.57) Out of laziness, I am leaving the only if part as an exercise. But what happens when X and Y are not independent random variables? We have the following inequality, which you can try and prove yourself. Proposition 4 For 2 discrete random variables, X and Y, we have H(X, Y ) H(X) + H(Y ). (2.58)
24 2.4. EXERCISES 27 We can also measure the difference between two sets of probabilities. In economics, we can use this to measure how far apart two sets of beliefs are. Definition 9 Suppose we have a discrete random variable X, which can take the values x 1,..., x N. We can define two different sets of probabilities, P = {p 1,..., p N } and Q = {q 1,..., q N }. The relative entropy or Kullback Leibler divergence between the two probabilities is D KL (P Q) = p n log 2 p n q n. (2.59) Proposition 5 (Gibbs Inequality) The relative entropy satisfies Gibbs inequality with equality only if P and Q are identical. D KL (P Q) 0, (2.60) 2.4 Exercises 1. Consider the concave function u(x). Suppose X is a random variable. Prove that E[u(X)] u(e[x]) V ar[x]u (E[X]). (2.61) 2. Let X 1,..., X N be independent random variables, all with uniform distribution on [0,1]. What is the probability of the event {X 1 > X 2 > > X N 1 > X N }? 3. Let X and Y be two non-constant random variables with finite variances. The correlation coefficient is denoted by ρ(x, Y ). (a) Using the Cauchy-Schwarz inequality or otherwise, prove that ρ(x, Y ) 1 (2.62) (b) What can be said about the relationship between X and Y when either (i) ρ(x, Y ) = 0 or (ii) ρ(x, Y ) = 1. [Proofs are not required.] (c) Take r [0, 1] and let X, X be independent random variables taking values ±1 with probabilities 1/2. Set { X with probability r Y = X with probability 1 r (2.63) Find ρ(x, Y ).
25 28 CHAPTER 2. INEQUALITIES 4. The 1-Trick and the Splitting Trick Show that for each real sequence x 1, x 2,..., x N one has x n ( N N x 2 n ) 1 2 (2.64) and show that one also has ( N ) 1 ( 2 N ) 1 2 a n a n 2/3 a n 4/3. (2.65) The two tricks illustrated by this simple exercise are very useful when proving inequalities. 5. If p(k; θ) 0 for all k D and θ Θ and if p(k; θ) = 1, θ Θ (2.66) k D then for each θ Θ one can think of M θ = {p(k; θ) : k D} as specifying a probability model where p(k; θ) represents the probability that we observe k when the parameter θ is the true state of nature If the function g : D R satisfies g(k)p(k; θ) = θ, θ Θ, (2.67) k D then g is called an unbiased estimator of the parameter θ. The variance of the unbiased estimator g is given by k D (g(k) θ)2 p(k; θ). Assuming that D is finite and p(k; θ) is a differentiable function of θ, show that one has the following lower bound for the variance of the unbiased estimator of θ (g(k) θ) 2 p(k; θ) 1 I(θ) k D where I : Θ R is defined by the sum (2.68) { } p(k; θ)/ θ 2 p(k; θ). (2.69) I(θ) = k D p(k; θ) The quantity I(θ) is known as the Fisher information at θ of the model θ. The inequality (2.68) is known as the Cramer-Rao lower bound, and it has extensive applications in mathematical statistics.
26 2.4. EXERCISES Show that if X is discrete r.v. such that Pr(X = x n ) = p n for n {1,..., N} and f : R R and g : R R are nondecreasing, then E[f(X)]E[g(X)] E[f(X)g(X)]. (2.70) 7. Given n random people, what is the probability that two or more of them have the same birthday? Under the natural (but approximate!) model where the birthdays are viewed as an independent and uniformly distributed in the set {1, 2,..., 365}, show that this probability is at least 1/2 if n A fair coin is flipped until the first head occurs. Let X denote the number of flips required. Find the entropy H(X) in bits. 9. Use Jensen s Inequality to prove Gibbs Inequality. 10. It is well known that there are infinitely many prime numbers a proof appears in Euclid s famous Elements. We will not only show that there are infinitely many prime numbers, but we will also give a lower bound on the rate of their growth using information theory. Let π(n) denote the number of primes no greater than n. Every positive integer n has a unique prime factorization of the form n = π(n) i=1 p X i(n) i, (2.71) where p 1, p 2, p 3,... are the primes, that is, p 1 = 2, p 2 = 3, p 3 = 5, etc., and X i (n) is the non-negative integer representing the multiplicity of p i in the prime factorization of n. Let N be uniformly distributed on {1, 2, 3,..., n}. (a) Show that X i (N) is an integer-valued random variable satisfying 0 X i (N) log 2 n. (2.72) [Hint : Try finding a lower and an upper bound for p X i(n) i ] (b) Show that π(n) log 2 n log 2 (log 2 n + 1). (2.73) [Hint: Do X 1 (N), X 2 (N),..., X π(n) (N) determine N? What does that say about the respective entropies?].
Probability inequalities 11
Paninski, Intro. Math. Stats., October 5, 2005 29 Probability inequalities 11 There is an adage in probability that says that behind every limit theorem lies a probability inequality (i.e., a bound on
More informationUniversity of Regina. Lecture Notes. Michael Kozdron
University of Regina Statistics 252 Mathematical Statistics Lecture Notes Winter 2005 Michael Kozdron kozdron@math.uregina.ca www.math.uregina.ca/ kozdron Contents 1 The Basic Idea of Statistics: Estimating
More information2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).
Name M362K Final Exam Instructions: Show all of your work. You do not have to simplify your answers. No calculators allowed. There is a table of formulae on the last page. 1. Suppose X 1,..., X 1 are independent
More informationDiscrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations
EECS 70 Discrete Mathematics and Probability Theory Fall 204 Anant Sahai Note 5 Random Variables: Distributions, Independence, and Expectations In the last note, we saw how useful it is to have a way of
More informationIf g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get
18:2 1/24/2 TOPIC. Inequalities; measures of spread. This lecture explores the implications of Jensen s inequality for g-means in general, and for harmonic, geometric, arithmetic, and related means in
More information1 Basic continuous random variable problems
Name M362K Final Here are problems concerning material from Chapters 5 and 6. To review the other chapters, look over previous practice sheets for the two exams, previous quizzes, previous homeworks and
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationInformation Theory and Communication
Information Theory and Communication Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/8 General Chain Rules Definition Conditional mutual information
More informationSome Concepts of Probability (Review) Volker Tresp Summer 2018
Some Concepts of Probability (Review) Volker Tresp Summer 2018 1 Definition There are different way to define what a probability stands for Mathematically, the most rigorous definition is based on Kolmogorov
More informationPart IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationIntroduction to Machine Learning
What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes
More informationLecture 5 - Information theory
Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information
More informationProbability Theory and Simulation Methods
Feb 28th, 2018 Lecture 10: Random variables Countdown to midterm (March 21st): 28 days Week 1 Chapter 1: Axioms of probability Week 2 Chapter 3: Conditional probability and independence Week 4 Chapters
More informationLecture 11. Probability Theory: an Overveiw
Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the
More informationHandout 1: Mathematical Background
Handout 1: Mathematical Background Boaz Barak February 2, 2010 This is a brief review of some mathematical tools, especially probability theory that we will use. This material is mostly from discrete math
More informationChapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye
Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality
More informationLecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality
Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality Discrete Structures II (Summer 2018) Rutgers University Instructor: Abhishek Bhrushundi
More information1 Basic continuous random variable problems
Name M362K Final Here are problems concerning material from Chapters 5 and 6. To review the other chapters, look over previous practice sheets for the two exams, previous quizzes, previous homeworks and
More informationPart IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationInformation & Correlation
Information & Correlation Jilles Vreeken 11 June 2014 (TADA) Questions of the day What is information? How can we measure correlation? and what do talking drums have to do with this? Bits and Pieces What
More informationLecture 22: Variance and Covariance
EE5110 : Probability Foundations for Electrical Engineers July-November 2015 Lecture 22: Variance and Covariance Lecturer: Dr. Krishna Jagannathan Scribes: R.Ravi Kiran In this lecture we will introduce
More informationMachine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang
Machine Learning Lecture 02.2: Basics of Information Theory Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology Nevin L. Zhang
More informationData Analysis and Monte Carlo Methods
Lecturer: Allen Caldwell, Max Planck Institute for Physics & TUM Recitation Instructor: Oleksander (Alex) Volynets, MPP & TUM General Information: - Lectures will be held in English, Mondays 16-18:00 -
More informationThe binary entropy function
ECE 7680 Lecture 2 Definitions and Basic Facts Objective: To learn a bunch of definitions about entropy and information measures that will be useful through the quarter, and to present some simple but
More informationCourse: ESO-209 Home Work: 1 Instructor: Debasis Kundu
Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear
More informationIntelligent Data Analysis. Principal Component Analysis. School of Computer Science University of Birmingham
Intelligent Data Analysis Principal Component Analysis Peter Tiňo School of Computer Science University of Birmingham Discovering low-dimensional spatial layout in higher dimensional spaces - 1-D/3-D example
More informationMore than one variable
Chapter More than one variable.1 Bivariate discrete distributions Suppose that the r.v. s X and Y are discrete and take on the values x j and y j, j 1, respectively. Then the joint p.d.f. of X and Y, to
More informationSTOR Lecture 16. Properties of Expectation - I
STOR 435.001 Lecture 16 Properties of Expectation - I Jan Hannig UNC Chapel Hill 1 / 22 Motivation Recall we found joint distributions to be pretty complicated objects. Need various tools from combinatorics
More informationCME 106: Review Probability theory
: Probability theory Sven Schmit April 3, 2015 1 Overview In the first half of the course, we covered topics from probability theory. The difference between statistics and probability theory is the following:
More informationStatistical Machine Learning Lectures 4: Variational Bayes
1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationX = X X n, + X 2
CS 70 Discrete Mathematics for CS Fall 2003 Wagner Lecture 22 Variance Question: At each time step, I flip a fair coin. If it comes up Heads, I walk one step to the right; if it comes up Tails, I walk
More informationDiscrete Mathematics and Probability Theory Fall 2013 Vazirani Note 12. Random Variables: Distribution and Expectation
CS 70 Discrete Mathematics and Probability Theory Fall 203 Vazirani Note 2 Random Variables: Distribution and Expectation We will now return once again to the question of how many heads in a typical sequence
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More information1 Review of The Learning Setting
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we
More informationBasic Probability. Introduction
Basic Probability Introduction The world is an uncertain place. Making predictions about something as seemingly mundane as tomorrow s weather, for example, is actually quite a difficult task. Even with
More informationLecture 4: Two-point Sampling, Coupon Collector s problem
Randomized Algorithms Lecture 4: Two-point Sampling, Coupon Collector s problem Sotiris Nikoletseas Associate Professor CEID - ETY Course 2013-2014 Sotiris Nikoletseas, Associate Professor Randomized Algorithms
More informationSUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)
SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems
More informationEE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018
Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code
More informationShannon s Noisy-Channel Coding Theorem
Shannon s Noisy-Channel Coding Theorem Lucas Slot Sebastian Zur February 2015 Abstract In information theory, Shannon s Noisy-Channel Coding Theorem states that it is possible to communicate over a noisy
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated
More informationHandout 1: Mathematical Background
Handout 1: Mathematical Background Boaz Barak September 18, 2007 This is a brief review of some mathematical tools, especially probability theory that we will use. This material is mostly from discrete
More informationLecture 5: Two-point Sampling
Randomized Algorithms Lecture 5: Two-point Sampling Sotiris Nikoletseas Professor CEID - ETY Course 2017-2018 Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 1 / 26 Overview A. Pairwise
More information1 Stat 605. Homework I. Due Feb. 1, 2011
The first part is homework which you need to turn in. The second part is exercises that will not be graded, but you need to turn it in together with the take-home final exam. 1 Stat 605. Homework I. Due
More informationSome Basic Concepts of Probability and Information Theory: Pt. 2
Some Basic Concepts of Probability and Information Theory: Pt. 2 PHYS 476Q - Southern Illinois University January 22, 2018 PHYS 476Q - Southern Illinois University Some Basic Concepts of Probability and
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Week #1
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Week #1 Today Introduction to machine learning The course (syllabus) Math review (probability + linear algebra) The future
More informationEE514A Information Theory I Fall 2013
EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/
More information3 Multiple Discrete Random Variables
3 Multiple Discrete Random Variables 3.1 Joint densities Suppose we have a probability space (Ω, F,P) and now we have two discrete random variables X and Y on it. They have probability mass functions f
More informationVectors and Matrices Statistics with Vectors and Matrices
Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc
More informationDiscrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 16. Random Variables: Distribution and Expectation
CS 70 Discrete Mathematics and Probability Theory Spring 206 Rao and Walrand Note 6 Random Variables: Distribution and Expectation Example: Coin Flips Recall our setup of a probabilistic experiment as
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More informationReview of Probability Theory
Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving
More information3.4 Introduction to power series
3.4 Introduction to power series Definition 3.4.. A polynomial in the variable x is an expression of the form n a i x i = a 0 + a x + a 2 x 2 + + a n x n + a n x n i=0 or a n x n + a n x n + + a 2 x 2
More informationFinal Review: Problem Solving Strategies for Stat 430
Final Review: Problem Solving Strategies for Stat 430 Hyunseung Kang December 14, 011 This document covers the material from the last 1/3 of the class. It s not comprehensive nor is it complete (because
More informationLecture 7: February 6
CS271 Randomness & Computation Spring 2018 Instructor: Alistair Sinclair Lecture 7: February 6 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They
More informationconditional cdf, conditional pdf, total probability theorem?
6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random
More information6.1 Main properties of Shannon entropy. Let X be a random variable taking values x in some alphabet with probabilities.
Chapter 6 Quantum entropy There is a notion of entropy which quantifies the amount of uncertainty contained in an ensemble of Qbits. This is the von Neumann entropy that we introduce in this chapter. In
More informationIEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion
IEOR 471: Stochastic Models in Financial Engineering Summer 27, Professor Whitt SOLUTIONS to Homework Assignment 9: Brownian motion In Ross, read Sections 1.1-1.3 and 1.6. (The total required reading there
More information. Find E(V ) and var(v ).
Math 6382/6383: Probability Models and Mathematical Statistics Sample Preliminary Exam Questions 1. A person tosses a fair coin until she obtains 2 heads in a row. She then tosses a fair die the same number
More informationECE 4400:693 - Information Theory
ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential
More informationThe Hilbert Space of Random Variables
The Hilbert Space of Random Variables Electrical Engineering 126 (UC Berkeley) Spring 2018 1 Outline Fix a probability space and consider the set H := {X : X is a real-valued random variable with E[X 2
More informationExpectation, inequalities and laws of large numbers
Chapter 3 Expectation, inequalities and laws of large numbers 3. Expectation and Variance Indicator random variable Let us suppose that the event A partitions the sample space S, i.e. A A S. The indicator
More informationReview (Probability & Linear Algebra)
Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint
More informationCS280, Spring 2004: Final
CS280, Spring 2004: Final 1. [4 points] Which of the following relations on {0, 1, 2, 3} is an equivalence relation. (If it is, explain why. If it isn t, explain why not.) Just saying Yes or No with no
More informationMAS223 Statistical Inference and Modelling Exercises
MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,
More informationWeek 12-13: Discrete Probability
Week 12-13: Discrete Probability November 21, 2018 1 Probability Space There are many problems about chances or possibilities, called probability in mathematics. When we roll two dice there are possible
More informationAlgorithms for Uncertainty Quantification
Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example
More informationInformation Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18
Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable
More informationA General Overview of Parametric Estimation and Inference Techniques.
A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying
More informationLecture 18: Quantum Information Theory and Holevo s Bound
Quantum Computation (CMU 1-59BB, Fall 2015) Lecture 1: Quantum Information Theory and Holevo s Bound November 10, 2015 Lecturer: John Wright Scribe: Nicolas Resch 1 Question In today s lecture, we will
More informationLecture 4: Probability and Discrete Random Variables
Error Correcting Codes: Combinatorics, Algorithms and Applications (Fall 2007) Lecture 4: Probability and Discrete Random Variables Wednesday, January 21, 2009 Lecturer: Atri Rudra Scribe: Anonymous 1
More informationChapter 1 Preliminaries
Chapter 1 Preliminaries 1.1 Conventions and Notations Throughout the book we use the following notations for standard sets of numbers: N the set {1, 2,...} of natural numbers Z the set of integers Q the
More information1 Review of Probability
1 Review of Probability Random variables are denoted by X, Y, Z, etc. The cumulative distribution function (c.d.f.) of a random variable X is denoted by F (x) = P (X x), < x
More informationQuantitative Biology Lecture 3
23 nd Sep 2015 Quantitative Biology Lecture 3 Gurinder Singh Mickey Atwal Center for Quantitative Biology Summary Covariance, Correlation Confounding variables (Batch Effects) Information Theory Covariance
More informationMath 151. Rumbos Fall Solutions to Review Problems for Exam 2. Pr(X = 1) = ) = Pr(X = 2) = Pr(X = 3) = p X. (k) =
Math 5. Rumbos Fall 07 Solutions to Review Problems for Exam. A bowl contains 5 chips of the same size and shape. Two chips are red and the other three are blue. Draw three chips from the bowl at random,
More informationLecture 1: Probability Fundamentals
Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability
More informationReview (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology
Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna
More informationPROBABILITY THEORY REVIEW
PROBABILITY THEORY REVIEW CMPUT 466/551 Martha White Fall, 2017 REMINDERS Assignment 1 is due on September 28 Thought questions 1 are due on September 21 Chapters 1-4, about 40 pages If you are printing,
More informationCS 246 Review of Proof Techniques and Probability 01/14/19
Note: This document has been adapted from a similar review session for CS224W (Autumn 2018). It was originally compiled by Jessica Su, with minor edits by Jayadev Bhaskaran. 1 Proof techniques Here we
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More information1 Maintaining a Dictionary
15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition
More informationLecture 1: September 25, A quick reminder about random variables and convexity
Information and Coding Theory Autumn 207 Lecturer: Madhur Tulsiani Lecture : September 25, 207 Administrivia This course will cover some basic concepts in information and coding theory, and their applications
More information[POLS 8500] Review of Linear Algebra, Probability and Information Theory
[POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming
More informationProbability reminders
CS246 Winter 204 Mining Massive Data Sets Probability reminders Sammy El Ghazzal selghazz@stanfordedu Disclaimer These notes may contain typos, mistakes or confusing points Please contact the author so
More informationIntroduction to Stochastic Processes
Stat251/551 (Spring 2017) Stochastic Processes Lecture: 1 Introduction to Stochastic Processes Lecturer: Sahand Negahban Scribe: Sahand Negahban 1 Organization Issues We will use canvas as the course webpage.
More informationCOS597D: Information Theory in Computer Science September 21, Lecture 2
COS597D: Information Theory in Computer Science September 1, 011 Lecture Lecturer: Mark Braverman Scribe: Mark Braverman In the last lecture, we introduced entropy H(X), and conditional entry H(X Y ),
More informationDiscrete Mathematics for CS Spring 2006 Vazirani Lecture 22
CS 70 Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22 Random Variables and Expectation Question: The homeworks of 20 students are collected in, randomly shuffled and returned to the students.
More informationIf we want to analyze experimental or simulated data we might encounter the following tasks:
Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction
More informationStochastic Processes
qmc082.tex. Version of 30 September 2010. Lecture Notes on Quantum Mechanics No. 8 R. B. Griffiths References: Stochastic Processes CQT = R. B. Griffiths, Consistent Quantum Theory (Cambridge, 2002) DeGroot
More informationSTAT2201. Analysis of Engineering & Scientific Data. Unit 3
STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random
More informationDiscrete Mathematics and Probability Theory Fall 2012 Vazirani Note 14. Random Variables: Distribution and Expectation
CS 70 Discrete Mathematics and Probability Theory Fall 202 Vazirani Note 4 Random Variables: Distribution and Expectation Random Variables Question: The homeworks of 20 students are collected in, randomly
More informationTopic 3 Random variables, expectation, and variance, II
CSE 103: Probability and statistics Fall 2010 Topic 3 Random variables, expectation, and variance, II 3.1 Linearity of expectation If you double each value of X, then you also double its average; that
More informationTheorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1
Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be
More information1 Basic Combinatorics
1 Basic Combinatorics 1.1 Sets and sequences Sets. A set is an unordered collection of distinct objects. The objects are called elements of the set. We use braces to denote a set, for example, the set
More information1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.
Problem sheet Ex. Verify that the function H(p,..., p n ) = k p k log p k satisfies all 8 axioms on H. Ex. (Not to be handed in). looking at the notes). List as many of the 8 axioms as you can, (without
More informationLECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline
LECTURE 2 Convexity and related notions Last time: Goals and mechanics of the class notation entropy: definitions and properties mutual information: definitions and properties Lecture outline Convexity
More informationMath 493 Final Exam December 01
Math 493 Final Exam December 01 NAME: ID NUMBER: Return your blue book to my office or the Math Department office by Noon on Tuesday 11 th. On all parts after the first show enough work in your exam booklet
More informationLecture 2: August 31
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy
More informationLecture 2: Review of Probability
Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................
More information