Extract. Data Analysis Tools

Size: px
Start display at page:

Download "Extract. Data Analysis Tools"

Transcription

1 Extract Data Analysis Tools Harjoat S. Bhamra July 8, 2017

2 Contents 1 Introduction 7 I Probability 13 2 Inequalities Jensen s inequality Arithmetic Mean-Geometric Mean Inequality Proof of Corollary Cauchy-Schwarz Inequality Multiple random variables Information entropy Exercises Weak Law of Large Numbers Markov Inequality Chebyshev inequality Weak law of large numbers Exercises Normal Distribution Calculations with the normal distribution Checking the normal density integrates to one Mode, median, sample mean and variance Bounds on tail probability of a normal distribution Multivariate normal Bivariate normal Brownian motion Exercises

3 4 CONTENTS 5 Generating and Characteristic Functions Moment Generating Functions Characteristic Functions A Brownian interlude with mirrors, but no smoke Exercises Central limit theorem Exercises II Statistics 65 7 Mathematical Statistics Introduction Mean squared error Bayesian inference 71 9 Estimating stochastic volatility models 73 III Applying Linear Algebra and Statistics to Data Analysis Principal Components Analysis (PCA) Overview A simple example from high school physics Linear Algebra for Principal Components Analysis Vector spaces Linear independence, subspaces, spanning and bases Change of basis How do we choose the basis? Noisy data Redundancy Covariance matrix Covariance matrix under a new basis PCA via Projection Mathematics of Orthogonal Projections Orthogonal projection operators Projecting the data onto a 1-d subspace Spectral Theorem for Real Symmetric Matrices Projecting the data onto an m-dimensional subspace Scree Plots

4 CONTENTS What about changing the basis? Orthogonal matrices Statistical Inference Exercises IV Exam Preparation Topics covered in Final Exam Linear Maps

5 Chapter 1 Introduction There are of course many reasons for learning mathematics. Some take the view of Simeon Poisson, to whom the following saying is attributed: Life is good for only two things: doing mathematics and teaching it. However, the type of person who takes Simeon Poisson s view of life probably does not need to read these notes. So what is their purpose? 7

6 8 CHAPTER 1. INTRODUCTION Simeon Poisson ( ) was a French mathematician. Poisson s name is attached to a wide variety of ideas in mathematics and physics, for example: Poisson s integral, Poisson s equation in potential theory, Poisson brackets in differential equations, Poisson s ratio in elasticity, and Poisson s constant in electricity.

7 9 Benjamin Disraeli ( ) was a British politician. Disraeli trained as a lawyer but wanted to be a novelist. He became a politician in the 1830 s and is generally acknowledged to be one of the key figures in the history of the Conservative Party. He was Prime Minister in 1868 and from 1874 to He famously acquired for Britain a large stake in the Suez Canal, and made Queen Victoria Empress of India. In 1876 he was raised to the peerage as the Earl of Beaconsfield. In these lecture notes, I hope to include sufficient mathematics for you to be able to be analyze data, without delving too deeply into advanced statistics or econometrics, but while covering enough material to ensure that you and your work are neither a danger to yourself, nor to others. As Benjamin Disraeli famously said: There are three types of lies lies, damn lies, and statistics. Hopefully, after this course, no one will say that about your work.

8 10 CHAPTER 1. INTRODUCTION We shall cover some basic probability theory and linear algebra, before delving into some elementary statistics, culminating with a study of principal components analysis. Principal components analysis (PCA) is one of a family of techniques for taking a large amount of data and summarizing it via a smaller more set of variables. You can think of it as the process of replacing a long book via a summary. More formally, PCA is the process of taking high-dimensional data and using the dependencies between the variables to represent it in a more tractable, lower-dimensional form, without losing too much information. A nice example of PCA is in politics. Stephen A. Weis, now a software engineer at Facebook, analyzed the voting records of senators in the first session of the 108th US Congress. He looked at 458 separate votes taken by the 100 senators. Each vote was described by a 1 (yes), 1 (no) or 0 (absent). In total, this gave rise to a 100 (number of senators) 458 (number of votes) sized matrix. Using PCA (also known as the singular value decomposition, SVD), Weis was able to reduce the dimensionality of the data to the extent that he could summarize it on a 2-dimensional plot depicted in Figure 1. If you were kind, you might describe these lecture notes as the result of PCA applied to the mathematics of data analysis.

9 Democratic and Republican senators have been colored blue and red, respectively. The values of the axes and the axes themselves are artifacts of the singular value decomposition. In other words, the axes don t mean anything they are simply the two most significant dimensions of the data s optimal representation. Regardless, one can see that this map clearly clusters senators according to party. Note that this map was generated only from voting records, without any data on party affiliation. From just the voting data, there is clearly a partisan divide between the parties. The above example illustrates the strengths and limitations of PCA. It allows you summarize large data sets via small set of variables in way which makes it easy to visualize the data. But it does not tell you what the summary variables mean. Indeed, as was said by Henry Clay: Statistics are no substitute for judgment. 11

10 12 CHAPTER 1. INTRODUCTION Henry Clay ( ) was an American lawyer and planter, politician, and skilled orator who represented Kentucky in both the United States Senate and House of Representatives. He served three non-consecutive terms as Speaker of the House of Representatives and served as Secretary of State under President John Quincy Adams from 1825 to Clay ran for the presidency in 1824, 1832 and 1844, while also seeking the Whig Party nomination in 1840 and 1848.

11 Part I Probability 13

12 Chapter 2 Inequalities Contents 2.1 Jensen s inequality Arithmetic Mean-Geometric Mean Inequality Proof of Corollary Cauchy-Schwarz Inequality Multiple random variables Information entropy Exercises We often model data via random variables. For example, stock returns are often assumed to be normal. Inequalities give us well-defined facts about random variables. Definition 1 A function f : (a, b) R is concave if for all x, y (a, b) and λ [0, 1] λf(x) + (1 λ)f(y) f(λx + (1 λ)y) It is strictly concave if strict inequality holds when x y and 0 < λ < 1. Definition 2 A function f is convex (strictly convex) if -f is concave (strictly concave). Fact If f is a twice differentiable function and f (x) 0 for all x (a, b) then f is concave [a basic exercise in Analysis]. It is strictly concave if f (x) > 0 for all x (a, b). 15

13 16 CHAPTER 2. INEQUALITIES The chord lies below the function. Johan Jensen ( ) He was a Danish mathematician and engineer. Although he studied mathematics among various subjects at college, and even published a research paper in mathematics, he learned advanced math topics later by himself and never held any academic position. Instead, he was a successful engineer for the Copenhagen Telephone Company between 1881 and 1924, and became head of the technical department in All his mathematics research was carried out in his spare time. Jensen is mostly renowned for his famous inequality, Jensen s inequality. In 1915, Jensen also proved Jensen s formula in complex analysis.

14 2.1. JENSEN S INEQUALITY Jensen s inequality Theorem 1 (Jensen s Inequality) Let f : (a, b) R be a concave function. Then ( N ) f p n x n p n f(x n ) (2.1) for all x 1,..., x N (a, b) and p 1,..., p N (0, 1) such that N p n = 1. Furthermore if f is strictly concave then equality holds iff all the x n are equal. If X is a random variable that takes finitely many values, Jensen s Inequality can be written as f(e[x]) E[f(X)]. (2.2) Proof of Theorem 1 We use proof by induction. Jensen s Inequality for N = 2 is just the definition of concavity. Suppose it holds for N 1. Now consider ( N ) ( ) f p n x n = f p 1 x 1 + p n x n (2.3) To apply the definition of concavity, we observe that ( N ) p n f p 1 x 1 + p k N n=2 p x n = f (p 1 x 1 + (1 p 1 ) z), (2.4) k where n=2 n=2 z = n=2 Applying the definition of concavity, we have n=2 p n N n=2 p k x n. (2.5) f (p 1 x 1 + (1 p 1 ) z) p 1 f(x 1 ) + (1 p 1 )f(z) (2.6) ( N ) p n = p 1 f(x 1 ) + (1 p 1 )f N n=2 p x n (2.7) k Jensen s Inequality holds for N 1 and so ( N ) p n p n f N n=2 p x n N k n=2 p f(x n ) (2.8) k n=2 n=2 n=2

15 18 CHAPTER 2. INEQUALITIES Therefore ( N ) f p n x n x 1 f(x 1 ) + = N p k n=2 n=2 p n N n=2 p k f(x n ) (2.9) x n f(x n ) (2.10) Therefore, if Jensen s Inequality holds for N 1, it also holds for N by virtue of concavity. Since, Jensen s Inequality for N = 2 is just the definition of concavity it follows by induction that Jensen s Inequality holds for all finite integers N greater than or equal to Arithmetic Mean-Geometric Mean Inequality Corollary 1 (Arithmetic Mean-Geometric Mean Inequality) Given positive real numbers x 1,..., x N, ( N ) 1 N x n 1 x n (2.11) N Proof of Corollary 1 Suppose X is a discrete random variable such that Pr(X = x n ) = 1 N, x n > 0, n {1,..., N} (2.12) Observe that ln x is a concave function of x and so from Jensen s Inequality E[ln X] ln E[X] (2.13) Therefore ( ) 1 1 ln x n ln x n N N ( N ) ( ) 1 N ln 1 x n ln x n N ( N ) 1 ( ) N ln x n 1 ln x n N (2.14) (2.15) (2.16)

16 2.2. CAUCHY-SCHWARZ INEQUALITY 19 Now because e x is monotonically increaasing, we have ( N x n ) 1 N 1 N x n (2.17) 2.2 Cauchy-Schwarz Inequality Theorem 2 (Cauchy-Schwarz Inequality) For any random variables X and Y E[XY ] 2 E[X 2 ]E[Y 2 ]. (2.18) Proof of Theorem 2 If Y = 0, then both sides are 0. Otherwise, E[Y 2 ] > 0. Let w = X Y E[XY ] E[Y 2 ]. (2.19) Then [ E[w 2 ] = E X 2 2XY E[XY ] E[Y 2 + Y ] ] 2 (E[XY ])2 (E[Y 2 ]) 2 = E[X 2 (E[XY ])2 (E[XY ])2 ] 2 E[Y 2 + ] E[Y 2 ] = E[X 2 ] (E[XY ])2 E[Y 2 ] (2.20) (2.21) (2.22) Since E[w 2 ] 0, the Cauchy-Schwarz inequality follows Multiple random variables If we have two random variables, we can study the relationship between them. Definition 3 (Covariance) Given two random variables X, Y, the covariance is Cov(X, Y ) = E[(X E[X])(Y E[Y ])]. Proposition 1 1. Cov(X, c) = 0 for constant c.

17 20 CHAPTER 2. INEQUALITIES Visualizing the Cauchy-Schwarz Inequality 2. Cov(X + c, Y ) = Cov(X, Y ). 3. Cov(X, Y ) = Cov(Y, X). 4. Cov(X, Y ) = E[XY ] E[X]E[Y ]. 5. Cov(X, X) = V ar(x). 6. V ar(x + Y ) = V ar(x) + V ar(y ) + 2Cov(X, Y ). 7. If X, Y are independent, Cov(X, Y ) = 0. These are all trivial to prove. It is important to note that Cov(X, Y ) = 0 does not imply X and Y are independent. Example 1

18 2.3. INFORMATION ENTROPY 21 Let (X, Y ) = (2, 0), ( 1, 1) or ( 1, 1) with equal probabilities of 1/3. These are not independent since Y = 0 X = 2. However, Cov(X, Y ) = E[XY ] E[X]E[Y ] = = 0. If we randomly pick a point on the unit circle, and let the coordinates be (X, Y ), then E[X] = E[Y ] = E[XY ] = 0 by symmetry. So Cov(X, Y ) = 0 but X and Y are clearly not independent (they have to satisfy x 2 + y 2 = 1). The covariance is not that useful in measuring how well two variables correlate. For one, the covariance can (potentially) have dimensions, which means that the numerical value of the covariance can depend on what units we are using. Also, the magnitude of the covariance depends largely on the variance of X and Y themselves. To solve these problems, we define Definition 4 (Correlation coefficient) The correlation coefficient of X and Y is ρ(x, Y ) = Cov(X, Y ) V ar(x)v ar(y ). Proposition 2 ρ(x, Y ) 1. Proof of 2 Apply Cauchy-Schwarz to X E[X] and Y E[Y ]. Again, zero correlation does not necessarily imply independence. 2.3 Information entropy Suppose we observe data about the economy up until 2010 and then look again in How much more information do we have? Information theory gives us ways of measuring information. We shall start (and end!) with the basic idea of information entropy, also known as Shannon s entropy. In the context of PCA, we want to reduce the dimensionality of a dataset, but without losing too much information. Entropy gives us a way of measuring this.

19 22 CHAPTER 2. INEQUALITIES Claude Shannon ( ). He introduced the notion that information could be quantified. In A Mathematical Theory of Communication, his legendary paper from 1948, Shannon proposed that data should be measured in bits discrete values of zero or one. Shannon developed information entropy as a measure for the uncertainty in a message while essentially inventing the field of information theory. Perhaps confusingly, in information theory, the term entropy refers to information we don t have (normally people define information as what they know!). The information we don t have about a system, its entropy, is related to its unpredictability: how much it can surprise us. Suppose an event A occurs with probability P (A) = p. How surprising is it? If is not very suprising, there cannot be much new information in the event. Let s try to invent a surprise function, say S(p). What properties should this have? Since a certain event is unsurprising we would like S(1) = 0. We should also like S(p) to be decreasing and continuous in p. If A and B are independent events then we should like S(P (A B)) = S(P (A)) + S(P (B)). It turns out that the only function with these properties is one of the form S(p) = c log a p, (2.23) with c > 0. For simplicity, take c = 1. The log can be any base, but for the time being let us use base 2 (a = 2).

20 2.3. INFORMATION ENTROPY 23 If X is a random variable that takes values 1,..., N with probabilities p 1,..., p N, then on average the surprise obtained on learning X is H(X) = ES(p x ) = p n log 2 p n (2.24) This is the information entropy of X. It is an important quantity in information theory. The log can be taken to any base, but using base 2, nh(x) is roughly the expected number of binary bits required to report the result of n experiments in which X 1,..., X N are i.i.d. observations from distribution (p n, 1 n N) and we encode our reporting of the results of experiments in the most efficient way. Let s use Jensens inequality to prove the entropy is maximized by p 1 =... = p N = 1/N. Consider f(x) = log x, which is a convex function. We may assume p n > 0 for all n. Let X be a r.v. such that X = 1/p n with probability p n. Then 1 p n log p n = Ef(X) f(e[x]) = f(n) = log N = N log 1 N (2.25) To provide some more underpinnings for ideas from information theory, we shall make two definitions. Definition 5 If X is a random variable that takes values x 1,..., x N with probabilities p 1,..., p N, then the Shannon information context of an outcome x n is defined as h(x n ) = log 2 1 p n. (2.26) Information content is measured in bits. One bit is typically defined as the uncertainty of a binary random variable that is 0 or 1 with equal probability, or the information that is gained when the value of such a variable becomes known. Definition 6 If X is a random variable that takes values x 1,..., x N with probabilities p 1,..., p N, then the information entropy of the random variable is given by the mean Shannon information content H(X) = p n log 2 p n (2.27) Note that the entropy does not depend on the values that the random variable takes, but only depends on the probability distribution. We can also define the joint entropy of a family of random variables.

21 24 CHAPTER 2. INEQUALITIES Definition 7 Consider a family of discrete random variables, X 1,..., X N, where X i takes a finite set of values in some set A, which wlog is a subset of N. Their joint entropy is defined by H(X 1,..., X n ) = x 1 A 1... x N A N Pr((X 1,..., X N ) = (x 1,..., x N )) log 2 Pr((X 1,..., X N ) = (x 1,..., x N )) Example 2 Suppose X 1 and X 2 take the following values (2.28) Pr(X 1 = 1, X 2 = 1) = 1/4 (2.29) Pr(X 1 = 1, X 2 = 1) = 1/4 (2.30) Pr(X 1 = 1, X 2 = 1) = 1/4 (2.31) Pr(X 1 = 1, X 2 = 1) = 1/4 (2.32) Clearly X 1 and X 2 are independent. The joint entropy of X 1 and X 2 is We can deduce that Observe that and so we see that 1 4 log log log log (2.33) = log 2 4 = 2 (2.34) Pr(X 1 = 1) = Pr(X 1 = 1) = 1/2 = Pr(X 2 = 1) = Pr(X 2 = 1). (2.35) H(X 1 ) = H(X 2 ) = 1 2 log log 2 2 = 1 (2.36) H(X 1, X 2 ) = H(X 1 ) + H(X 2 ) = 2. (2.37) Suppose X 1 and X 2 are correlated and take the following values Pr(X 1 = 1, X 2 = 1) = 1/6 (2.38) Pr(X 1 = 1, X 2 = 1) = 1/3 (2.39) Pr(X 1 = 1, X 2 = 1) = 1/3 (2.40) Pr(X 1 = 1, X 2 = 1) = 1/6 (2.41)

22 2.3. INFORMATION ENTROPY 25 The joint entropy of X 1 and X 2 is and so We can easily deduce that But, now 1 6 log log log log = 1 3 log log (2.42) (2.43) = 1 3 log log 2 3 (2.44) = log 2 6 1/3 3 2/3 = < 2 (2.45) Pr(X 1 = 1) = Pr(X 1 = 1) = 1/2 = Pr(X 2 = 1) = Pr(X 2 = 1), (2.46) H(X 1 ) = H(X 2 ) = 1 (2.47) = H(X 1, X 2 ) < H(X 1 ) + H(X 2 ) = 2. (2.48) This result is intuitive. When the random variables are correlated, their joint information is less than their sum. You may well have seen the following definition of independence for discrete random variables. Definition 8 Consider two discrete random variables, X and Y, which can take values on the set {a 1,..., a N }. X and Y are independent if i, j, {1,..., N}, Pr({X = a i, Y = a j }) = Pr(X = a i ) Pr(Y = a j ). (2.49) Using the above definition we can prove that the joint entropy of two independent random variables is just the sum of the individual entropies. Proposition 3 For two discrete random variables, X and Y, if and only if X and Y are independent. H(X, Y ) = H(X) + H(Y ) (2.50)

23 26 CHAPTER 2. INEQUALITIES Proof of Proposition 3 From Definition 7, we have H(X, Y ) = i=1 j=1 Pr({X = a i, Y = a j }) log 1 Pr({X = a i, Y = a j }). (2.51) Supposing X and Y are independent, we obtain H(X, Y ) = i=1 j=1 Pr({X = a i, Y = a j }) log 1 N Pr(X = a i ) + i=1 j=1 Pr({X = a i, Y = a j }) log (2.52) 1 Pr(Y = a j ). Hence H(X, Y ) = 1 log Pr(X = a i ) i=1 Pr({X = a i, Y = a j }) + j=1 1 log Pr(Y = a j ) j=1 Pr({X = a i, Y = a j }). i=1 (2.53) Observe that Pr({X = a i, Y = a j }) = Pr(X = a i ) (2.54) j=1 and Pr({X = a i, Y = a j }) = Pr(Y = a j ). (2.55) i=1 Hence H(X, Y ) = 1 log Pr(X = a i ) Pr(X = a i) + i=1 log j=1 1 Pr(Y = a j ) Pr(Y = a j) (2.56) = H(X) + H(Y ). (2.57) Out of laziness, I am leaving the only if part as an exercise. But what happens when X and Y are not independent random variables? We have the following inequality, which you can try and prove yourself. Proposition 4 For 2 discrete random variables, X and Y, we have H(X, Y ) H(X) + H(Y ). (2.58)

24 2.4. EXERCISES 27 We can also measure the difference between two sets of probabilities. In economics, we can use this to measure how far apart two sets of beliefs are. Definition 9 Suppose we have a discrete random variable X, which can take the values x 1,..., x N. We can define two different sets of probabilities, P = {p 1,..., p N } and Q = {q 1,..., q N }. The relative entropy or Kullback Leibler divergence between the two probabilities is D KL (P Q) = p n log 2 p n q n. (2.59) Proposition 5 (Gibbs Inequality) The relative entropy satisfies Gibbs inequality with equality only if P and Q are identical. D KL (P Q) 0, (2.60) 2.4 Exercises 1. Consider the concave function u(x). Suppose X is a random variable. Prove that E[u(X)] u(e[x]) V ar[x]u (E[X]). (2.61) 2. Let X 1,..., X N be independent random variables, all with uniform distribution on [0,1]. What is the probability of the event {X 1 > X 2 > > X N 1 > X N }? 3. Let X and Y be two non-constant random variables with finite variances. The correlation coefficient is denoted by ρ(x, Y ). (a) Using the Cauchy-Schwarz inequality or otherwise, prove that ρ(x, Y ) 1 (2.62) (b) What can be said about the relationship between X and Y when either (i) ρ(x, Y ) = 0 or (ii) ρ(x, Y ) = 1. [Proofs are not required.] (c) Take r [0, 1] and let X, X be independent random variables taking values ±1 with probabilities 1/2. Set { X with probability r Y = X with probability 1 r (2.63) Find ρ(x, Y ).

25 28 CHAPTER 2. INEQUALITIES 4. The 1-Trick and the Splitting Trick Show that for each real sequence x 1, x 2,..., x N one has x n ( N N x 2 n ) 1 2 (2.64) and show that one also has ( N ) 1 ( 2 N ) 1 2 a n a n 2/3 a n 4/3. (2.65) The two tricks illustrated by this simple exercise are very useful when proving inequalities. 5. If p(k; θ) 0 for all k D and θ Θ and if p(k; θ) = 1, θ Θ (2.66) k D then for each θ Θ one can think of M θ = {p(k; θ) : k D} as specifying a probability model where p(k; θ) represents the probability that we observe k when the parameter θ is the true state of nature If the function g : D R satisfies g(k)p(k; θ) = θ, θ Θ, (2.67) k D then g is called an unbiased estimator of the parameter θ. The variance of the unbiased estimator g is given by k D (g(k) θ)2 p(k; θ). Assuming that D is finite and p(k; θ) is a differentiable function of θ, show that one has the following lower bound for the variance of the unbiased estimator of θ (g(k) θ) 2 p(k; θ) 1 I(θ) k D where I : Θ R is defined by the sum (2.68) { } p(k; θ)/ θ 2 p(k; θ). (2.69) I(θ) = k D p(k; θ) The quantity I(θ) is known as the Fisher information at θ of the model θ. The inequality (2.68) is known as the Cramer-Rao lower bound, and it has extensive applications in mathematical statistics.

26 2.4. EXERCISES Show that if X is discrete r.v. such that Pr(X = x n ) = p n for n {1,..., N} and f : R R and g : R R are nondecreasing, then E[f(X)]E[g(X)] E[f(X)g(X)]. (2.70) 7. Given n random people, what is the probability that two or more of them have the same birthday? Under the natural (but approximate!) model where the birthdays are viewed as an independent and uniformly distributed in the set {1, 2,..., 365}, show that this probability is at least 1/2 if n A fair coin is flipped until the first head occurs. Let X denote the number of flips required. Find the entropy H(X) in bits. 9. Use Jensen s Inequality to prove Gibbs Inequality. 10. It is well known that there are infinitely many prime numbers a proof appears in Euclid s famous Elements. We will not only show that there are infinitely many prime numbers, but we will also give a lower bound on the rate of their growth using information theory. Let π(n) denote the number of primes no greater than n. Every positive integer n has a unique prime factorization of the form n = π(n) i=1 p X i(n) i, (2.71) where p 1, p 2, p 3,... are the primes, that is, p 1 = 2, p 2 = 3, p 3 = 5, etc., and X i (n) is the non-negative integer representing the multiplicity of p i in the prime factorization of n. Let N be uniformly distributed on {1, 2, 3,..., n}. (a) Show that X i (N) is an integer-valued random variable satisfying 0 X i (N) log 2 n. (2.72) [Hint : Try finding a lower and an upper bound for p X i(n) i ] (b) Show that π(n) log 2 n log 2 (log 2 n + 1). (2.73) [Hint: Do X 1 (N), X 2 (N),..., X π(n) (N) determine N? What does that say about the respective entropies?].

Probability inequalities 11

Probability inequalities 11 Paninski, Intro. Math. Stats., October 5, 2005 29 Probability inequalities 11 There is an adage in probability that says that behind every limit theorem lies a probability inequality (i.e., a bound on

More information

University of Regina. Lecture Notes. Michael Kozdron

University of Regina. Lecture Notes. Michael Kozdron University of Regina Statistics 252 Mathematical Statistics Lecture Notes Winter 2005 Michael Kozdron kozdron@math.uregina.ca www.math.uregina.ca/ kozdron Contents 1 The Basic Idea of Statistics: Estimating

More information

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1). Name M362K Final Exam Instructions: Show all of your work. You do not have to simplify your answers. No calculators allowed. There is a table of formulae on the last page. 1. Suppose X 1,..., X 1 are independent

More information

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations EECS 70 Discrete Mathematics and Probability Theory Fall 204 Anant Sahai Note 5 Random Variables: Distributions, Independence, and Expectations In the last note, we saw how useful it is to have a way of

More information

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get 18:2 1/24/2 TOPIC. Inequalities; measures of spread. This lecture explores the implications of Jensen s inequality for g-means in general, and for harmonic, geometric, arithmetic, and related means in

More information

1 Basic continuous random variable problems

1 Basic continuous random variable problems Name M362K Final Here are problems concerning material from Chapters 5 and 6. To review the other chapters, look over previous practice sheets for the two exams, previous quizzes, previous homeworks and

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Information Theory and Communication

Information Theory and Communication Information Theory and Communication Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/8 General Chain Rules Definition Conditional mutual information

More information

Some Concepts of Probability (Review) Volker Tresp Summer 2018

Some Concepts of Probability (Review) Volker Tresp Summer 2018 Some Concepts of Probability (Review) Volker Tresp Summer 2018 1 Definition There are different way to define what a probability stands for Mathematically, the most rigorous definition is based on Kolmogorov

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

Lecture 5 - Information theory

Lecture 5 - Information theory Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information

More information

Probability Theory and Simulation Methods

Probability Theory and Simulation Methods Feb 28th, 2018 Lecture 10: Random variables Countdown to midterm (March 21st): 28 days Week 1 Chapter 1: Axioms of probability Week 2 Chapter 3: Conditional probability and independence Week 4 Chapters

More information

Lecture 11. Probability Theory: an Overveiw

Lecture 11. Probability Theory: an Overveiw Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the

More information

Handout 1: Mathematical Background

Handout 1: Mathematical Background Handout 1: Mathematical Background Boaz Barak February 2, 2010 This is a brief review of some mathematical tools, especially probability theory that we will use. This material is mostly from discrete math

More information

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality

More information

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality Discrete Structures II (Summer 2018) Rutgers University Instructor: Abhishek Bhrushundi

More information

1 Basic continuous random variable problems

1 Basic continuous random variable problems Name M362K Final Here are problems concerning material from Chapters 5 and 6. To review the other chapters, look over previous practice sheets for the two exams, previous quizzes, previous homeworks and

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Information & Correlation

Information & Correlation Information & Correlation Jilles Vreeken 11 June 2014 (TADA) Questions of the day What is information? How can we measure correlation? and what do talking drums have to do with this? Bits and Pieces What

More information

Lecture 22: Variance and Covariance

Lecture 22: Variance and Covariance EE5110 : Probability Foundations for Electrical Engineers July-November 2015 Lecture 22: Variance and Covariance Lecturer: Dr. Krishna Jagannathan Scribes: R.Ravi Kiran In this lecture we will introduce

More information

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang Machine Learning Lecture 02.2: Basics of Information Theory Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology Nevin L. Zhang

More information

Data Analysis and Monte Carlo Methods

Data Analysis and Monte Carlo Methods Lecturer: Allen Caldwell, Max Planck Institute for Physics & TUM Recitation Instructor: Oleksander (Alex) Volynets, MPP & TUM General Information: - Lectures will be held in English, Mondays 16-18:00 -

More information

The binary entropy function

The binary entropy function ECE 7680 Lecture 2 Definitions and Basic Facts Objective: To learn a bunch of definitions about entropy and information measures that will be useful through the quarter, and to present some simple but

More information

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear

More information

Intelligent Data Analysis. Principal Component Analysis. School of Computer Science University of Birmingham

Intelligent Data Analysis. Principal Component Analysis. School of Computer Science University of Birmingham Intelligent Data Analysis Principal Component Analysis Peter Tiňo School of Computer Science University of Birmingham Discovering low-dimensional spatial layout in higher dimensional spaces - 1-D/3-D example

More information

More than one variable

More than one variable Chapter More than one variable.1 Bivariate discrete distributions Suppose that the r.v. s X and Y are discrete and take on the values x j and y j, j 1, respectively. Then the joint p.d.f. of X and Y, to

More information

STOR Lecture 16. Properties of Expectation - I

STOR Lecture 16. Properties of Expectation - I STOR 435.001 Lecture 16 Properties of Expectation - I Jan Hannig UNC Chapel Hill 1 / 22 Motivation Recall we found joint distributions to be pretty complicated objects. Need various tools from combinatorics

More information

CME 106: Review Probability theory

CME 106: Review Probability theory : Probability theory Sven Schmit April 3, 2015 1 Overview In the first half of the course, we covered topics from probability theory. The difference between statistics and probability theory is the following:

More information

Statistical Machine Learning Lectures 4: Variational Bayes

Statistical Machine Learning Lectures 4: Variational Bayes 1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

X = X X n, + X 2

X = X X n, + X 2 CS 70 Discrete Mathematics for CS Fall 2003 Wagner Lecture 22 Variance Question: At each time step, I flip a fair coin. If it comes up Heads, I walk one step to the right; if it comes up Tails, I walk

More information

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 12. Random Variables: Distribution and Expectation

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 12. Random Variables: Distribution and Expectation CS 70 Discrete Mathematics and Probability Theory Fall 203 Vazirani Note 2 Random Variables: Distribution and Expectation We will now return once again to the question of how many heads in a typical sequence

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

1 Review of The Learning Setting

1 Review of The Learning Setting COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we

More information

Basic Probability. Introduction

Basic Probability. Introduction Basic Probability Introduction The world is an uncertain place. Making predictions about something as seemingly mundane as tomorrow s weather, for example, is actually quite a difficult task. Even with

More information

Lecture 4: Two-point Sampling, Coupon Collector s problem

Lecture 4: Two-point Sampling, Coupon Collector s problem Randomized Algorithms Lecture 4: Two-point Sampling, Coupon Collector s problem Sotiris Nikoletseas Associate Professor CEID - ETY Course 2013-2014 Sotiris Nikoletseas, Associate Professor Randomized Algorithms

More information

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

Shannon s Noisy-Channel Coding Theorem

Shannon s Noisy-Channel Coding Theorem Shannon s Noisy-Channel Coding Theorem Lucas Slot Sebastian Zur February 2015 Abstract In information theory, Shannon s Noisy-Channel Coding Theorem states that it is possible to communicate over a noisy

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated

More information

Handout 1: Mathematical Background

Handout 1: Mathematical Background Handout 1: Mathematical Background Boaz Barak September 18, 2007 This is a brief review of some mathematical tools, especially probability theory that we will use. This material is mostly from discrete

More information

Lecture 5: Two-point Sampling

Lecture 5: Two-point Sampling Randomized Algorithms Lecture 5: Two-point Sampling Sotiris Nikoletseas Professor CEID - ETY Course 2017-2018 Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 1 / 26 Overview A. Pairwise

More information

1 Stat 605. Homework I. Due Feb. 1, 2011

1 Stat 605. Homework I. Due Feb. 1, 2011 The first part is homework which you need to turn in. The second part is exercises that will not be graded, but you need to turn it in together with the take-home final exam. 1 Stat 605. Homework I. Due

More information

Some Basic Concepts of Probability and Information Theory: Pt. 2

Some Basic Concepts of Probability and Information Theory: Pt. 2 Some Basic Concepts of Probability and Information Theory: Pt. 2 PHYS 476Q - Southern Illinois University January 22, 2018 PHYS 476Q - Southern Illinois University Some Basic Concepts of Probability and

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Week #1 Today Introduction to machine learning The course (syllabus) Math review (probability + linear algebra) The future

More information

EE514A Information Theory I Fall 2013

EE514A Information Theory I Fall 2013 EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/

More information

3 Multiple Discrete Random Variables

3 Multiple Discrete Random Variables 3 Multiple Discrete Random Variables 3.1 Joint densities Suppose we have a probability space (Ω, F,P) and now we have two discrete random variables X and Y on it. They have probability mass functions f

More information

Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 16. Random Variables: Distribution and Expectation

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 16. Random Variables: Distribution and Expectation CS 70 Discrete Mathematics and Probability Theory Spring 206 Rao and Walrand Note 6 Random Variables: Distribution and Expectation Example: Coin Flips Recall our setup of a probabilistic experiment as

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

3.4 Introduction to power series

3.4 Introduction to power series 3.4 Introduction to power series Definition 3.4.. A polynomial in the variable x is an expression of the form n a i x i = a 0 + a x + a 2 x 2 + + a n x n + a n x n i=0 or a n x n + a n x n + + a 2 x 2

More information

Final Review: Problem Solving Strategies for Stat 430

Final Review: Problem Solving Strategies for Stat 430 Final Review: Problem Solving Strategies for Stat 430 Hyunseung Kang December 14, 011 This document covers the material from the last 1/3 of the class. It s not comprehensive nor is it complete (because

More information

Lecture 7: February 6

Lecture 7: February 6 CS271 Randomness & Computation Spring 2018 Instructor: Alistair Sinclair Lecture 7: February 6 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They

More information

conditional cdf, conditional pdf, total probability theorem?

conditional cdf, conditional pdf, total probability theorem? 6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random

More information

6.1 Main properties of Shannon entropy. Let X be a random variable taking values x in some alphabet with probabilities.

6.1 Main properties of Shannon entropy. Let X be a random variable taking values x in some alphabet with probabilities. Chapter 6 Quantum entropy There is a notion of entropy which quantifies the amount of uncertainty contained in an ensemble of Qbits. This is the von Neumann entropy that we introduce in this chapter. In

More information

IEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion

IEOR 4701: Stochastic Models in Financial Engineering. Summer 2007, Professor Whitt. SOLUTIONS to Homework Assignment 9: Brownian motion IEOR 471: Stochastic Models in Financial Engineering Summer 27, Professor Whitt SOLUTIONS to Homework Assignment 9: Brownian motion In Ross, read Sections 1.1-1.3 and 1.6. (The total required reading there

More information

. Find E(V ) and var(v ).

. Find E(V ) and var(v ). Math 6382/6383: Probability Models and Mathematical Statistics Sample Preliminary Exam Questions 1. A person tosses a fair coin until she obtains 2 heads in a row. She then tosses a fair die the same number

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

The Hilbert Space of Random Variables

The Hilbert Space of Random Variables The Hilbert Space of Random Variables Electrical Engineering 126 (UC Berkeley) Spring 2018 1 Outline Fix a probability space and consider the set H := {X : X is a real-valued random variable with E[X 2

More information

Expectation, inequalities and laws of large numbers

Expectation, inequalities and laws of large numbers Chapter 3 Expectation, inequalities and laws of large numbers 3. Expectation and Variance Indicator random variable Let us suppose that the event A partitions the sample space S, i.e. A A S. The indicator

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

CS280, Spring 2004: Final

CS280, Spring 2004: Final CS280, Spring 2004: Final 1. [4 points] Which of the following relations on {0, 1, 2, 3} is an equivalence relation. (If it is, explain why. If it isn t, explain why not.) Just saying Yes or No with no

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

Week 12-13: Discrete Probability

Week 12-13: Discrete Probability Week 12-13: Discrete Probability November 21, 2018 1 Probability Space There are many problems about chances or possibilities, called probability in mathematics. When we roll two dice there are possible

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example

More information

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18 Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

Lecture 18: Quantum Information Theory and Holevo s Bound

Lecture 18: Quantum Information Theory and Holevo s Bound Quantum Computation (CMU 1-59BB, Fall 2015) Lecture 1: Quantum Information Theory and Holevo s Bound November 10, 2015 Lecturer: John Wright Scribe: Nicolas Resch 1 Question In today s lecture, we will

More information

Lecture 4: Probability and Discrete Random Variables

Lecture 4: Probability and Discrete Random Variables Error Correcting Codes: Combinatorics, Algorithms and Applications (Fall 2007) Lecture 4: Probability and Discrete Random Variables Wednesday, January 21, 2009 Lecturer: Atri Rudra Scribe: Anonymous 1

More information

Chapter 1 Preliminaries

Chapter 1 Preliminaries Chapter 1 Preliminaries 1.1 Conventions and Notations Throughout the book we use the following notations for standard sets of numbers: N the set {1, 2,...} of natural numbers Z the set of integers Q the

More information

1 Review of Probability

1 Review of Probability 1 Review of Probability Random variables are denoted by X, Y, Z, etc. The cumulative distribution function (c.d.f.) of a random variable X is denoted by F (x) = P (X x), < x

More information

Quantitative Biology Lecture 3

Quantitative Biology Lecture 3 23 nd Sep 2015 Quantitative Biology Lecture 3 Gurinder Singh Mickey Atwal Center for Quantitative Biology Summary Covariance, Correlation Confounding variables (Batch Effects) Information Theory Covariance

More information

Math 151. Rumbos Fall Solutions to Review Problems for Exam 2. Pr(X = 1) = ) = Pr(X = 2) = Pr(X = 3) = p X. (k) =

Math 151. Rumbos Fall Solutions to Review Problems for Exam 2. Pr(X = 1) = ) = Pr(X = 2) = Pr(X = 3) = p X. (k) = Math 5. Rumbos Fall 07 Solutions to Review Problems for Exam. A bowl contains 5 chips of the same size and shape. Two chips are red and the other three are blue. Draw three chips from the bowl at random,

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

PROBABILITY THEORY REVIEW

PROBABILITY THEORY REVIEW PROBABILITY THEORY REVIEW CMPUT 466/551 Martha White Fall, 2017 REMINDERS Assignment 1 is due on September 28 Thought questions 1 are due on September 21 Chapters 1-4, about 40 pages If you are printing,

More information

CS 246 Review of Proof Techniques and Probability 01/14/19

CS 246 Review of Proof Techniques and Probability 01/14/19 Note: This document has been adapted from a similar review session for CS224W (Autumn 2018). It was originally compiled by Jessica Su, with minor edits by Jayadev Bhaskaran. 1 Proof techniques Here we

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

1 Maintaining a Dictionary

1 Maintaining a Dictionary 15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition

More information

Lecture 1: September 25, A quick reminder about random variables and convexity

Lecture 1: September 25, A quick reminder about random variables and convexity Information and Coding Theory Autumn 207 Lecturer: Madhur Tulsiani Lecture : September 25, 207 Administrivia This course will cover some basic concepts in information and coding theory, and their applications

More information

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory [POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

More information

Probability reminders

Probability reminders CS246 Winter 204 Mining Massive Data Sets Probability reminders Sammy El Ghazzal selghazz@stanfordedu Disclaimer These notes may contain typos, mistakes or confusing points Please contact the author so

More information

Introduction to Stochastic Processes

Introduction to Stochastic Processes Stat251/551 (Spring 2017) Stochastic Processes Lecture: 1 Introduction to Stochastic Processes Lecturer: Sahand Negahban Scribe: Sahand Negahban 1 Organization Issues We will use canvas as the course webpage.

More information

COS597D: Information Theory in Computer Science September 21, Lecture 2

COS597D: Information Theory in Computer Science September 21, Lecture 2 COS597D: Information Theory in Computer Science September 1, 011 Lecture Lecturer: Mark Braverman Scribe: Mark Braverman In the last lecture, we introduced entropy H(X), and conditional entry H(X Y ),

More information

Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22

Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22 CS 70 Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22 Random Variables and Expectation Question: The homeworks of 20 students are collected in, randomly shuffled and returned to the students.

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

Stochastic Processes

Stochastic Processes qmc082.tex. Version of 30 September 2010. Lecture Notes on Quantum Mechanics No. 8 R. B. Griffiths References: Stochastic Processes CQT = R. B. Griffiths, Consistent Quantum Theory (Cambridge, 2002) DeGroot

More information

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

STAT2201. Analysis of Engineering & Scientific Data. Unit 3 STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random

More information

Discrete Mathematics and Probability Theory Fall 2012 Vazirani Note 14. Random Variables: Distribution and Expectation

Discrete Mathematics and Probability Theory Fall 2012 Vazirani Note 14. Random Variables: Distribution and Expectation CS 70 Discrete Mathematics and Probability Theory Fall 202 Vazirani Note 4 Random Variables: Distribution and Expectation Random Variables Question: The homeworks of 20 students are collected in, randomly

More information

Topic 3 Random variables, expectation, and variance, II

Topic 3 Random variables, expectation, and variance, II CSE 103: Probability and statistics Fall 2010 Topic 3 Random variables, expectation, and variance, II 3.1 Linearity of expectation If you double each value of X, then you also double its average; that

More information

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1 Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be

More information

1 Basic Combinatorics

1 Basic Combinatorics 1 Basic Combinatorics 1.1 Sets and sequences Sets. A set is an unordered collection of distinct objects. The objects are called elements of the set. We use braces to denote a set, for example, the set

More information

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H. Problem sheet Ex. Verify that the function H(p,..., p n ) = k p k log p k satisfies all 8 axioms on H. Ex. (Not to be handed in). looking at the notes). List as many of the 8 axioms as you can, (without

More information

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline LECTURE 2 Convexity and related notions Last time: Goals and mechanics of the class notation entropy: definitions and properties mutual information: definitions and properties Lecture outline Convexity

More information

Math 493 Final Exam December 01

Math 493 Final Exam December 01 Math 493 Final Exam December 01 NAME: ID NUMBER: Return your blue book to my office or the Math Department office by Noon on Tuesday 11 th. On all parts after the first show enough work in your exam booklet

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

Lecture 2: Review of Probability

Lecture 2: Review of Probability Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................

More information