Probability, CLT, CLT counterexamples, Bayes. The PDF file of this lecture contains a full reference document on probability and random variables.

Size: px
Start display at page:

Download "Probability, CLT, CLT counterexamples, Bayes. The PDF file of this lecture contains a full reference document on probability and random variables."

Transcription

1 Lecture 5 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring Probability, CLT, CLT counterexamples, Bayes The PDF file of this lecture contains a full reference document on probability and random variables.

2 Notions of Probability Qualitative: (Knowledge) As a measure of the degree to which propositions, hypotheses, or quantities are known. The measure can be an intrinsic property of an event in and of itself or relative to other events. eg. It is not probable that the sun will explode as a supernova is an absolute statement based on our background knowledge of stellar evolution. eg. Horse A is more probable to win, place, or show than horse B is a relative statement that includes the constraint that one horse wins to the exclusion of any other horse in the race. Bayesian inference makes use of this qualitative form of probability along with quantitative aspects discussed below. In this sense, Bayesian methods attribute probabilities to more entities than do some of the more formalistic approaches. 1

3 Quantitative: (Frequentist approach: random variables and ensembles) Classical: Probabilities of events ζ in an event space S are found, a priori, by considerations of possible outcomes of experiments and the manner in which the outcomes may be achieved. For example, a single-die yields probabilities of 1/6 for the outcomes of a single toss of the die. A problem with the classical definition (which often calculates probabilities from line segment lengths, areas, volumes, etc.) is that they cannot be evaluated for all cases (eg. an unfair die). Relative Frequency: The probability of an event ζ is defined to be the limit of the frequency of occurrence of the event in N repeated experiments as N : P {ζ} = lim N σ N N. The problem with frequencies is that in real experiments N is always finite and therefore probabilities can only be estimated. Such estimates are a poor basis for deductive probabilistic theories. 2

4 Axiomatic: A deductive theory of probability can be based on axioms of probability for allowed events in an overall event space. The axioms are: For ζ = an element in the event space S and P (ζ) = probability of the event ζ the following hold: i) 0 P (ζ) 1 ii) P {S} =1(S = set of all events, so P {S} is the probability that any event will be the outcome of an experiment. iii) If ζ 1 and ζ 2 are mutually exclusive then the probability of ζ 1 + ζ 2 ( the event that ζ 1 or ζ 1 occurs) is P (ζ 1 + ζ 2 )=P (ζ 1 )+P (ζ 2 ). These axioms imply further that: iv) If ζ = event that ζ does not occur, then P ( ζ) =1 P (ζ). v) If ζ 1 is a sufficient condition for ζ 2, then P (ζ 1 ) P (ζ 2 ). Equality holds when ζ 2 is also a necessary condition for ζ 2. vi) Let P (ζ 1 ζ 2 ) = probability of the event that ζ 1 and ζ 2 occurs (overlap in Venn diagram). Mutually exclusive events have P (ζ 1 ζ 2 )=0while generally P (ζ 1 ζ 2 ) 0. In general we also have P (ζ 1 + ζ 2 )=P (ζ 1 )+P (ζ 2 ) P (ζ 1 ζ 2 ) P (ζ 1 )+P (ζ 2 ) 3

5

6 Conditional Probabilities: Also satisfying the axioms are conditional probabilities. Consider the probability that an event ζ 2 occurs given that the event ζ 1 occurs. It may be shown that this probability is Similarly, Bayes Theorem: P {ζ 2 ζ 1 } = P {ζ 1ζ 2 } P {ζ 1 }. P {ζ 1 ζ 2 } = P {ζ 2ζ 1 } P {ζ 2 }. Solving for P {ζ 1 ζ 2 } from the two preceding equations and setting these solutions equal yields Bayes theorem P {ζ 2 ζ 1 } = P {ζ 1 ζ 2 }P {ζ 2 }. P {ζ 1 } 4

7 Bayesian Inference: Bayesian inference is based on the preceding equation where, with relaxed definitions of the event space, hypotheses and parameters are attributed probabilities based on knowledge before and after an experiment is conducted. Bayes theorem combined with the qualitative interpretation of probability therefore allows the sequential acquisition of knowledge (ie. learning) to be handled. The implied temporal sequence of events, by which data are accumulated and the likelihood of a hypothesis being true increases or decreases, represents the power of the Bayesian outlook. Moreover, with Bayesian inference, assumptions behind the inference are often brought up front as conditions upon which probabilities are calculated.

8 Probability on One Page 1. PDF and CDF: f X (x) has unit area and F X (x) is in [0, 1]; f X (x) = df X(x) dx 2. Characteristic function: Φ X (ω) = dx f X (x)e iωx e iωx 3. Moments: X n = i n d n Φ X (ω) dω n ω=0 (moment generating function) 3. Change of variable: y = g(x) with solutions x j = g 1 (y),j =1,...,n. Probability is conserved: f Y (y)dy = f X (x)dx: n f X (x j ) f Y (y) = dg(x)/dx x=xj NxN transformation: the derivative is replaced by the determinant of the Jacobian matrix. 4. Sum of independent RVs: Z = X + Y: Easily extendable to a sum of N RVs. 5. Conditional PDFs: f X (x y)f Y (y) =f XY (x, y) 6. Bayes Theorem: j=1 f Z (z) = f X f Y Convolution = Φ Z (ω) = Φ X (ω)φ Y (ω) Product = σ 2 Z = σ 2 X + σ 2 Y Variances add f Y (y x) = f X(x y)f Y (y) f X (x) 1

9 Generating Pseudo Random Numbers Cf. Section 5.13 of Gregory Several methods: 1. transformation method (exponential example) E.g. X = g(y) 2. CDF mapping of uniformly distributed numbers Works for arbitrary output PDF 3. Measurements of natural systems Higher-order statistics (e.g. specifying a power spectral shape): other methods are needed, to be discussed.

10 U in [0, 1] Arbitrary output PDF

11

12

13 All three measures are localization measures Other quantities are needed to measure the width and asymmetry of the PDF, etc.

14 Functions of a random variable: The function Y = g(x) is a random variable that is a mapping from some event A to a number Y according to: Theorem, if Y = g(x), then the PDF of Y is Y (A) =g[x(a)] f Y (y) = n j=1 f X (x j ) dg(x)/dx x=xj, where x j,j =1,nare the solutions of x = g 1 (y). Note the normalization property is conserved (unit area). This is one of the most important equations! Example 1 Y = g(x) =ax + b To check: show that dy f Y (y) =1 dg dx = a g 1 (y) = x 1 = y b a f X (x 1 ) f Y (y) = dg(x 1 )/dx = a 1 f X ( y b a ). 7

15 Example 2 Suppose we want to transform from a uniform distribution to an exponential distribution: We want f Y (y) =exp( y). A typical random number generator gives f X (x) with 1, 0 x<1; f X (x) = 0, otherwise. Choose y = g(x) = ln(x). Then: dg dx x 1 = 1 x = g 1 (y) =e y Factoid: Poission events in time have spacings that are exponentially distributed f Y (y) = f X[exp( y)] 1/x 1 = x 1 =e y. 8

16 Moments We will always use angular brackets < > to denote average over an ensemble (integrating over an ensemble); time averages and other sample averages will be denoted differently. Expected value of a random variable with respect to the PDF of x: E(X) X = dx xf X (x) Arbitrary power: X n = dx x n f X (x) Variance: σ 2 x = X 2 X 2 Function of a random variable: If y = g(x) and Y dy y f Y (y) then it is easy to show that Y = dx g(x)f X (x). Proof: y dy f Y (y) = dy n j=1 f X [x j (y)] dg[x j (y)]/dx A change of variable: dy = dg dx yields the result. dx Central Moments: µ n = (X X) n 9

17 Moment Tests: Moments are useful for testing hypotheses such as whether a given PDF is consistent with data: E.g. Consistency with Gaussian PDF: kurtosis k = µ 4 µ 2 3=0 2 skewness parameter γ = µ 3 µ 3/2 2 k>0 4th moment proportionately larger larger amplitude tail than Gaussian and less probable values near the mean. =0 10

18 Uses of Moments: Often one wants to infer the underlying PDF of an observable, e.g. perhaps because determination of the PDF is tantamount to understanding the underlying physics of some process. Two approaches are: 1. construct a histogram and compare the shape with a theoretical shape. 2. determine some of the moments (usually low-order) and compare. Suppose the data are {x j,j =1,N} 1. One could form bins of size x and count how many x j fall into each bin. If N is large enough so that n k =#points in the k-th bin is also large, then a reasonably good estimate of the PDF can be made. (But beware of dependence of results on choice of binning.) 2. However, often times N is too small or one would like to determine only basic information about the shape of the distribution (is it symmetric?), or determine the mean and variance of the PDF or test whether the data are consistent with a given PDF (hypothesis testing). 11

19 Gaussian case: Some typical situations are: i) assume the data were drawn from a Gaussian parent PDF; estimate the mean and σ of the Gaussian [parameter estimation] ii) test whether the data are consistent with a Gaussian PDF [moment test] note that if the r.v. is zero mean then the PDF is determined solely by one parameter: σ f X (x) = 1 2πσ 2 e x2 /2σ 2 The moments are x n = (n 1)σ n (n 1)!! σ n n even 0 n odd Therefore, the n =2moment = 1st non-zero moment all other moments. This statement remains for more multi-dimensional Gaussian processes: Any moment of order higher than 3 is redundant... or can be used as a test for gaussianity. 12

20 Characteristic Function: Of considerable use is the characteristic function Φ X (ω) e iωx dx f X (x) e iωx. If we know Φ X (ω) then we know all there is to know about the PDF because is the inversion formula. f X (x) = 1 2π dω Φ X (ω) e iωx If we know all the moments of f X (x), then we also can completely characterize f X (x). Similarly, the characteristic function is a moment-generating function: Φ X (ω) =e iωx (iωx) n (iω) n = X n n=0 n! n=0 n! because the expectation of the sum = sum of the expectations. By taking derivatives we can show that or Φ ω ω=0 = ix 2 Φ ω 2 ω=0 = i2 X 2 k Φ ω k ω=0 = in X n X n = i n n Φ ω n ω=0 =( i) n n Φ ω n ω=0 Price stheorem Characteristic functions are useful for deriving PDFs of combinations of r.v. s as well as for deriving particular moments. 13

21 Joint Random Variables Let X and Y be two random variables with their associated sample spaces. The actual events associated with X and Y may or may not be independent (e.g. throwing a die may map into X; choosing colored marbles from a hat may map into Y ). The relationship of the events will be described by the joint distribution function of X and Y : and the joint probability density function is F XY (x, y) P {X x, Y y} f XY (x, y) 2 F xy (x, y) x y (a two dimensional PDF) Note that the one dimensional PDF of X, for example, is obtained by integrating the joint PDF over all y: f X (x) = dy f XY (x, y) which corresponds to asking what the PFf of X is given that the certain event for Y occurs. Example: flip two coins a and b. Let heads =1; tails =0. Define 2 r.v. s: X = a + b; Y = a. With these definitions X + Y are statistically dependent. Characteristic function of joint r.v. s: Φ XY (ω 1, ω 2 ) = e i(ω1x+ω2y ) = dx dy e i(ω1x+ω2y) f XY (x, y). For x, y independent Φ XY (ω 1, ω 2 )= dx f X (x) e iω1x dy f Y (y) e iω2y Φ X (ω 1 ) Φ Y (ω 2 ). Example for independent r.v. s: flip two coins a and b. As before, heads = 1 and tails = 0, let x = a, y = b (x and y are independent). 14

22 Independent random variables Two random variables are said to be independent if the events mapping into one r.v. are independent of those mapping into the other. In this case, joint probabilities are factorable so that F XY (x, y) = F X (x) F Y (y) f XY (x, y) = f X (x) f Y (y). Such factorization is plausible if one considers moments of independent r.v. s: X n Y m = X n Y m which follows from X n Y m dx dy x n y m f XY (x, y) = dx x n f X (x) dy y m f Y (y). 15

23 Convolution theorem for sums of independent RVs If Z = X + Y where X, Y are independent random variables, then the PDF of Z is the convolution of the PDFs of X and Y : f Z (z) =f X (x) f Y (y) = dx f X (x) f Y (z x) = dx f X (z x) f Y (x). proof: By definition, Consider Now, as before, this is f Z (z) = d dz F Z(z) F z (z) =P {Z z} F Z (z) = P {X + Y z} = P {Y z X}. To evaluate this, first evaluate the probability P {Y z x} where x is just a number. Now P {Y z x} F Y (z x) z x dy f Y (y) but P {Y z X} is the probability that Y z x for all values of x so we need to integrate over x and weight by the probability of x: P {Y z X} = dx f X (x) z x dy F Y (y) that is, P {Y z X} is the expected value of F Y (z x). By the Leibniz integration formula d db g(b) a dω h(ω) h(g(b)) dg(b) db we obtain the convolution results. 16

24 Characteristic function of Z = X + Y For X, Y independent we have f Z = f X f Y Φ Z (ω) =e iωz = Φ X (ω) Φ Y (ω) Variance of Z: if variance of X and Y are σ 2 X, σ 2 Y, then variance of Z is σ 2 Z = σ 2 X + σ 2 Y. Assume X and Y and hence Z are zero mean r.v. s, then we have σ 2 X = x 2 = i 2 2 φ x ω 2 (ω =0) = 2 φ x ω 2 (ω =0) σ 2 Y = y 2 = 2 φ y ω 2 (ω =0) Using Price s theorem: σz 2 = Z 2 = 2 φ Z (ω =0) ω2 = 2 ω 2 [φ X(ω) φ Y (ω)] ω=0 = φ Y φ X ω ω + φ Y 2 φ Y = φ X ω 2 + φ Y 2 φ x ω 2 φ X ω ω=0 +2 φ X ω φ Y. ω ω=0 We have discovered that variances add (independent variables only): σ 2 Z = σ 2 X + σ 2 Y. 17

25 Multivariate random variables: N dimensional: The results for the bivariate case are easily extrapolated. If Z = X 1 + X X N = N where the X j are all independent r.v. s, then X j j=1 f Z (z) =f X1 f X2... f XN and and Φ Z = N j=1 Φ Xj (ω) σz 2 N σx 2 j. j=1 18

26 Central Limit Theorem The Central Limit Theorem is a powerful tool for observational science because it can be used to invoke a priori distributions for measured quantities. In simple terms, any quantity that is the sum of many independent ones will have Gaussian statistics. We need to understand what constitutes independence and how Gaussian statistics play out for stochastic processes, which may be viewed as sequences of a large number of random variables. In addition, the CLT does not always apply! Consider the sum Z N = 1 N N j=1 X j where the X j are independent, identically distributed (iid) random variables with means and variances µ j X j σ 2 j = X 2 j X j 2. and the PDFs of the X j s are almost arbitrary. Restrictions on the distributions of each X j are that i) σ 2 j >m>0 m = constant ii) X n <M= constant for n>2 In the limit N,Z N becomes a Gaussian random variable with mean and variance Z N = 1 N µ j, σz 2 = 1 N j=1 N N j=1 σ 2 j. 1

27 Example of arithmetic sums of uniformly distributed numbers Consider the sum x = 1 N N x j j=1 where the x j are drawn from a uniform PDF: x j [ 1/2, 1/2]. The figure shows counts of x for averages of length N =1, 2, 4, 8 and 16 for 100 realizations. 2

28 Example of arithmetic sums of uniformly distributed numbers (continued) Histograms based on 10 5 realizations are shown in the figure below. Two important features are: 1. the width of the PDF scales as N 1/2 2. the shape of the PDF tends toward a Gaussian form with x =0and σ x = x 2 1/2 1/ 12N. 3

29 Cauchy random numbers A Cauchy distribution is Its characteristic function is α 1 f X (x) = π α 2 + x 2 Φ X (ω) =e α ω. Using uniformly distributed numbers u [ 1/2, 1/2], Cauchy random numbers x can be generated using the transformation x =tan(πu). Check: Using the change of variable theorem, where has been used. f X (x) = f u (u(x) = 1 1+x 2, du dx dx du = d du tan πu = 1 cos 2 πu =1+x2 4

30 Example of arithmetic sums of Cauchy distributed numbers Histograms based on 10 5 realizations are shown in the figure below. Here neither the width nor the shame of the PDF changes as N gets larger! What s happening? 5

31 First let s understand the case of summing uniformly distributed numbers Again consider X j that are all uniformly distributed between ± 1 2. The PDF of a single random variable is f X (x) =Π(x) and it is a Fourier transform pair with its characteristic function or f X (x) =Π(x) Φ j (ω) =e iωx j = sin ω/2 ω/2 sin πf πf = sin ω 2 ω/2 Graphically: 6

32 X j ) and use the convolution theorem to evaluate the character- Now consider sums of N RVs of this type (i.e. j istic function of the sum: Graphically: Gaussian N =2 N =3 N = e x2 ( sin ω 2 ω/2 )2 sin ω/2 ( ω/2 )3 e ω2 7

33 We need to rescale the sum to find the characteristic function of defined earlier. From the convolution results we have Z N = 1 N N j=1 x j φ sin ω/2 NZ N (ω) = ω/2 From the transformation of random variables we have that N f ZN (x) = Nf NZ N ( Nx) and by the scaling theorem for Fourier transforms φ ZN (ω) =φ ω sin ω/2 N NZ N N = ω/2 N N. 8

34 If the CLT holds: Now or lim N φ Z N (ω) =e 1 2 ω2 σ 2 Z f ZN (x) = 1 2πσ 2 Z e x2 /2σ 2 Z. Consistency with this limiting form can be seen by expanding φ ZN for small ω ω/2 N 1 φ ZN (ω) 3! (ω/2 N) 3 ω/2 N ω 2 1 N 24 that is identical to the expansion of exp ( ω 2 σz/2). 2 9

35 Why the CLT does not work for a sum of Cauchy variables: The Cauchy distribution and its characteristic function are f X (x) = α π Φ(w) = e α ω 1 α 2 + x 2 In this case has a characteristic function Z N = 1 N N j=1 x j Φ N (ω) =e Nα ω / N By inspection the exponential will not converge to a Gaussian. Instead, the sum of N Cauchy RVs is a Cauchy RV. Is the Cauchy distribution a legitimate PDF? No! The variance diverges: X 2 = dx x2α π 1 α 2 + x 2. The Cauchy distribution is an example of a stable distribution, defined as one that has the property that a linear combination of two independent copies of the variable has the same distribution to within a location and a scale parameter. The family of stable distributions is sometimes called the Levy alpha-stable distribution. (Levy flights, etc.) There is a generalized CLT involving PDFs with long power- law tails h"p://en.wikipedia.org/wiki/lévy_distribu:on 10 h"p://en.wikipedia.org/wiki/stable_distribu:on

36 Conditional Probabilities & Bayes Theroem We have considered P (ζ), the probability of an event ζ. Also obeying axioms of probability are conditional probabilities: P (ψ ζ), the probability of the event ψ given that the event ζ has occurred. Recast the axioms as P (ψ ζ) P (ψζ) P (ζ) I. P (ψ ζ) 0 II. P (ψ ζ)+p( ψ ζ) =1 III. P (ψζ η) = P (ψ η)p (ζ ψη) = P (ζ η)p (ψ ζη) 19

37 From Bayes Theorem to Bayesian Inference How does this relate to experiments? Use the product rule: P (ζ ψη) = P (ζ η)p (ψ ζη) P (ψ η) or, letting M = model (or hypothesis), D = data and I = background information (assumptions), Terms: prior: P (M I) P (M DI) =P (M I) P (D MI) P (D I) sampling distribution for D: P (D MI) (also called likelihood for M) prior predictive for D: P (D I) (also called global likelihood for M or evidence for M) 20

38 Particular strengths of Bayesian method include: 1. One must often be explicit about what is assumed about I, the background information. 2. In assessing models, we get a PDF for parameters rather than just point estimates. 3. Occam s razor (simpler models win, all else being equal) is easily invoked when comparing models. We may have many different models, M i that we wish to compare. Form the odds ratio: from the posterior PDFs P (M i DI) O i,j P (M i DI) P (M j DI) = P (M i I) P (D M i I) P (M j I) P (D M j I). 21

39 Full document on Probability and Random Variables

40 Probability and Random Processes Experiments Set up certain conditions to which the possible outcomes are called events. The event space S is the set of all outcomes or events ζ i,i=1,n. Events may or may not be quantitative e.g. the experiment may consist of choosing colored marbles from a hat. Detections are experiments designed to answer the question: Is effect present in this physical system? Measurements are experiments designed to yield quantitative measures of some physical phenomenon. Measurements are simply a highly structured form of interaction with a physical system. As such, they are never precise. Estimation of physical parameters is the best one can do, even for values of fundamental constants. Probability The notion of probability arises when we wish to consider the likelihood of a given event occurring or if we wish to estimate the number of times an identifiable event will occur if we repeat a given experiment N times. event space: events ζ i S: Events are possible outcomes of experiments. Events can be combined to define new events. The set of all events is the event space. 1

41 As such, probability is a theoretical quantity and is not the same as the frequency of occurrence of an event in repeated trials of an experiment. Of course, one can estimate the probabilities from repeated trials. We will consider probability to be the underpinning of experiments and we will require it to behave according to three axioms: Let ζ be an event in S, then i) 0 P (ζ) 1 ii) P (S = space of all events) =1 iii) If two events are mutually exclusive [i.e. the occurrence of ζ does not influence the occurrence of ψ], then the probability of the event ζ + ψ = event that ζ or ψ occurs is P (ζ + ψ) =P (ζ)+p (ψ) (+ means or ) From the axioms, one can construct such results as: 1. Ā = event that A does not occur P (Ā) =1 P (A) 2. If the occurrence of A is a sufficient condition for B occurring, [A B but B may occur when A does not] then i P (A) P (B) 3. P (A + B) =P (A)+P (B) P (AB ) where P (AB) = probability that both A and B occur. P (AB) 0, with equality when A, B are mutally exclusive. P (A + B) P (A)+P (B). 2

42 I. Mutually exclusive events: If a occurs then b cannot have occurred. Let c = a + b + or (same as a b) P (c) =P {a or b occurred} = P (a)+p(b) Let d = a b and (same as a b) P (d) =P {a and b occurred} =0 if mutually exclusive II. Non-mutually exclusive events: P (c) =P {a or b} = P (a)+p (b) P (ab) III. Independent events: P (ab) P (a)p (b) 3

43 Examples I. Mutually exclusive events toss a coin once: 2 possible outcomes H & T H & T are mutually exclusive H & T are not independent because P (HT)=P{heads & tails} =0so P (HT) = P (H)P (T ). II. Independent events toss a coin twice = experiment The outcomes of the experiment are 1st toss 2nd toss H 1 H 2 H 1 T 2 T 1 H 2 T 1 T 2 events might be defined as: H 1 H 2 = event that H on 1st toss, H on 2nd H 1 T 2 = event that H on 1st toss, T on 2nd T 1 H 2 = event that T on 1st toss, H on 2nd T 1 T 2 = event that T on 1st toss, T on 2nd note P (H 1 H 2 )=P (H 1 )P (H 2 ) [as long as coin not altered between tosses] 4

44 Random Variables Of interest to us is the distribution of probability along the real number axis: Random variables assign numbers to events or, more precisely, map the event space into a set of numbers: a X(a) event number The definition of probability translates directly over to the numbers that are assigned by random variables. The following properties are true for a real random variable. 1. Let {X x} = event that the r.v. X is less than the number x; defined for all x [this defines all intervals on the real number line to be events] 2. the events {X =+ } and {X = } have zero probability. (Otherwise, moments would not be finite, generally.) 5

45 Distribution function: (CDF = Cumulative Distribution Function) properties: F X (x) =P {X x} P {all events A : X(A) x} 1. F X (x) is a monotonically increasing function of x. 2. F ( ) =0, F (+ ) =1 3. P {x 1 X x 2 } = F (x 2 ) F (x 1 ) Probability Density Function (PDF) Properties: f X (x) = df X(x) dx 1. f X (x) dx = P {x X x + dx} 2. dx f X(x) =F X ( ) F X ( ) =1 0=1 Continuous RVs: derivative of F X (x) exists x Discrete random variables: use delta functions to write the PDF in pseudo continuous form. e.g. coin flipping 1 heads Let X = 1 tails then f X (x) = 1 [δ(x +1)+δ(x 1)] 2 F X (x) = 1 [U(x +1)+U(x 1)] 2 6

46 All three measures are localization measures Other quantities are needed to measure the width and asymmetry of the PDF, etc.

47 Functions of a random variable: The function Y = g(x) is a random variable that is a mapping from some event A to a number Y according to: Theorem, if Y = g(x), then the PDF of Y is Y (A) =g[x(a)] f Y (y) = n j=1 f X (x j ) dg(x)/dx x=xj, where x j,j =1,nare the solutions of x = g 1 (y). Note the normalization property is conserved (unit area). This is one of the most important equations! Example 1 Y = g(x) =ax + b To check: show that dy f Y (y) =1 dg dx = a g 1 (y) = x 1 = y b a f X (x 1 ) f Y (y) = dg(x 1 )/dx = a 1 f X ( y b a ). 7

48 Example 2 Suppose we want to transform from a uniform distribution to an exponential distribution: We want f Y (y) =exp( y). A typical random number generator gives f X (x) with 1, 0 x<1; f X (x) = 0, otherwise. Choose y = g(x) = ln(x). Then: dg dx x 1 = 1 x = g 1 (y) =e y Factoid: Poission events in time have spacings that are exponentially distributed f Y (y) = f X[exp( y)] 1/x 1 = x 1 =e y. 8

49 Moments We will always use angular brackets < > to denote average over an ensemble (integrating over an ensemble); time averages and other sample averages will be denoted differently. Expected value of a random variable with respect to the PDF of x: E(X) X = dx xf X (x) Arbitrary power: X n = dx x n f X (x) Variance: σ 2 x = X 2 X 2 Function of a random variable: If y = g(x) and Y dy y f Y (y) then it is easy to show that Y = dx g(x)f X (x). Proof: y dy f Y (y) = dy n j=1 f X [x j (y)] dg[x j (y)]/dx A change of variable: dy = dg dx yields the result. dx Central Moments: µ n = (X X) n 9

50 Moment Tests: Moments are useful for testing hypotheses such as whether a given PDF is consistent with data: E.g. Consistency with Gaussian PDF: kurtosis k = µ 4 µ 3/2 3=0 2 skewness parameter γ = µ 3 µ 3/2 2 k>0 4th moment proportionately larger larger amplitude tail than Gaussian and less probable values near the mean. =0 10

51 Uses of Moments: Often one wants to infer the underlying PDF of an observable, e.g. perhaps because determination of the PDF is tantamount to understanding the underlying physics of some process. Two approaches are: 1. construct a histogram and compare the shape with a theoretical shape. 2. determine some of the moments (usually low-order) and compare. Suppose the data are {x j,j =1,N} 1. One could form bins of size x and count how many x j fall into each bin. If N is large enough so that n k =#points in the k-th bin is also large, then a reasonably good estimate of the PDF can be made. (But beware of dependence of results on choice of binning.) 2. However, often times N is too small or one would like to determine only basic information about the shape of the distribution (is it symmetric?), or determine the mean and variance of the PDF or test whether the data are consistent with a given PDF (hypothesis testing). 11

52 Gaussian case: Some typical situations are: i) assume the data were drawn from a Gaussian parent PDF; estimate the mean and σ of the Gaussian [parameter estimation] ii) test whether the data are consistent with a Gaussian PDF [moment test] note that if the r.v. is zero mean then the PDF is determined solely by one parameter: σ f X (x) = 1 2πσ 2 e x2 /2σ 2 The moments are x n = (n 1)σ n (n 1)!! σ n n even 0 n odd Therefore, the n =2moment = 1st non-zero moment all other moments. This statement remains for more multi-dimensional Gaussian processes: Any moment of order higher than 3 is redundant... or can be used as a test for gaussianity. 12

53 Characteristic Function: Of considerable use is the characteristic function Φ X (ω) e iωx dx f X (x) e iωx. If we know Φ X (ω) then we know all there is to know about the PDF because is the inversion formula. f X (x) = 1 2π dω Φ X (ω) e iωx If we know all the moments of f X (x), then we also can completely characterize f X (x). Similarly, the characteristic function is a moment-generating function: Φ X (ω) =e iωx (iωx) n (iω) n = X n n=0 n! n=0 n! because the expectation of the sum = sum of the expectations. By taking derivatives we can show that or Φ ω ω=0 = ix 2 Φ ω 2 ω=0 = i2 X 2 k Φ ω k ω=0 = in X n X n = i n n Φ ω n ω=0 =( i) n n Φ ω n ω=0 Price stheorem Characteristic functions are useful for deriving PDFs of combinations of r.v. s as well as for deriving particular moments. 13

54 Joint Random Variables Let X and Y be two random variables with their associated sample spaces. The actual events associated with X and Y may or may not be independent (e.g. throwing a die may map into X; choosing colored marbles from a hat may map into Y ). The relationship of the events will be described by the joint distribution function of X and Y : and the joint probability density function is F XY (x, y) P {X x, Y y} f XY (x, y) 2 F xy (x, y) x y (a two dimensional PDF) Note that the one dimensional PDF of X, for example, is obtained by integrating the joint PDF over all y: f X (x) = dy f XY (x, y) which corresponds to asking what the PFf of X is given that the certain event for Y occurs. Example: flip two coins a and b. Let heads =1; tails =0. Define 2 r.v. s: X = a + b; Y = a. With these definitions X + Y are statistically dependent. Characteristic function of joint r.v. s: Φ XY (ω 1, ω 2 ) = e i(ω1x+ω2y ) = dx dy e i(ω1x+ω2y) f XY (x, y). For x, y independent Φ XY (ω 1, ω 2 )= dx f X (x) e iω1x dy f Y (y) e iω2y Φ X (ω 1 ) Φ Y (ω 2 ). Example for independent r.v. s: flip two coins a and b. As before, heads = 1 and tails = 0, let x = a, y = b (x and y are independent). 14

55 Independent random variables Two random variables are said to be independent if the events mapping into one r.v. are independent of those mapping into the other. In this case, joint probabilities are factorable so that F XY (x, y) = F X (x) F Y (y) f XY (x, y) = f X (x) f Y (y). Such factorization is plausible if one considers moments of independent r.v. s: X n Y m = X n Y m which follows from X n Y m dx dy x n y m f XY (x, y) = dx x n f X (x) dy y m f Y (y). 15

56 Convolution theorem for sums of independent RVs If Z = X + Y where X, Y are independent random variables, then the PDF of Z is the convolution of the PDFs of X and Y : f Z (z) =f X (x) f Y (y) = dx f X (x) f Y (z x) = dx f X (z x) f Y (x). proof: By definition, Consider Now, as before, this is f Z (z) = d dz F Z(z) F z (z) =P {Z z} F Z (z) = P {X + Y z} = P {Y z X}. To evaluate this, first evaluate the probability P {Y z x} where x is just a number. Now P {Y z x} F Y (z x) z x dy f Y (y) but P {Y z X} is the probability that Y z x for all values of x so we need to integrate over x and weight by the probability of x: P {Y z X} = dx f X (x) z x dy F Y (y) that is, P {Y z X} is the expected value of F Y (z x). By the Leibniz integration formula d db g(b) a dω h(ω) h(g(b)) dg(b) db we obtain the convolution results. 16

57 Characteristic function of Z = X + Y For X, Y independent we have f Z = f X f Y Φ Z (ω) =e iωz = Φ X (ω) Φ Y (ω) Variance of Z: if variance of X and Y are σ 2 X, σ 2 Y, then variance of Z is σ 2 Z = σ 2 X + σ 2 Y. Assume X and Y and hence Z are zero mean r.v. s, then we have σ 2 X = x 2 = i 2 2 φ x ω 2 (ω =0) = 2 φ x ω 2 (ω =0) σ 2 Y = y 2 = 2 φ y ω 2 (ω =0) Using Price s theorem: σz 2 = Z 2 = 2 φ Z (ω =0) ω2 = 2 ω 2 [φ X(ω) φ Y (ω)] ω=0 = φ Y φ X ω ω + φ Y 2 φ Y = φ X ω 2 + φ Y 2 φ x ω 2 φ X ω ω=0 +2 φ X ω φ Y. ω ω=0 We have discovered that variances add (independent variables only): σ 2 Z = σ 2 X + σ 2 Y. 17

58 Multivariate random variables: N dimensional: The results for the bivariate case are easily extrapolated. If Z = X 1 + X X N = N where the X j are all independent r.v. s, then X j j=1 f Z (z) =f X1 f X2... f XN and and Φ Z = N j=1 Φ Xj (ω) σz 2 N σx 2 j. j=1 18

59 Conditional Probabilities & Bayes Theroem We have considered P (ζ), the probability of an event ζ. Also obeying axioms of probability are conditional probabilities: P (ψ ζ), the probability of the event ψ given that the event ζ has occurred. Recast the axioms as P (ψ ζ) P (ψζ) P (ζ) I. P (ψ ζ) 0 II. P (ψ ζ)+p( ψ ζ) =1 III. P (ψζ η) = P (ψ η)p (ζ ψη) = P (ζ η)p (ψ ζη) 19

60 From Bayes Theorem to Bayesian Inference How does this relate to experiments? Use the product rule: P (ζ ψη) = P (ζ η)p (ψ ζη) P (ψ η) or, letting M = model (or hypothesis), D = data and I = background information (assumptions), Terms: prior: P (M I) P (M DI) =P (M I) P (D MI) P (D I) sampling distribution for D: P (D MI) (also called likelihood for M) prior predictive for D: P (D I) (also called global likelihood for M or evidence for M) 20

61 Particular strengths of Bayesian method include: 1. One must often be explicit about what is assumed about I, the background information. 2. In assessing models, we get a PDF for parameters rather than just point estimates. 3. Occam s razor (simpler models win, all else being equal) is easily invoked when comparing models. We may have many different models, M i that we wish to compare. Form the odds ratio: from the posterior PDFs P (M i DI) O i,j P (M i DI) P (M j DI) = P (M i I) P (D M i I) P (M j I) P (D M j I). 21

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 Reading Chapter 5 (continued) Lecture 8 Key points in probability CLT CLT examples Prior vs Likelihood Box & Tiao

More information

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 Reading Chapter 5 of Gregory (Frequentist Statistical Inference) Lecture 7 Examples of FT applications Simulating

More information

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University. Motivations: Detection & Characterization. Lecture 2.

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University. Motivations: Detection & Characterization. Lecture 2. A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University Lecture 2 Probability basics Fourier transform basics Typical problems Overall mantra: Discovery and cri@cal thinking with data + The

More information

conditional cdf, conditional pdf, total probability theorem?

conditional cdf, conditional pdf, total probability theorem? 6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair

More information

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com 1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics, Appendix

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables Joint Probability Density Let X and Y be two random variables. Their joint distribution function is F ( XY x, y) P X x Y y. F XY ( ) 1, < x

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example

More information

Chapter 2: Random Variables

Chapter 2: Random Variables ECE54: Stochastic Signals and Systems Fall 28 Lecture 2 - September 3, 28 Dr. Salim El Rouayheb Scribe: Peiwen Tian, Lu Liu, Ghadir Ayache Chapter 2: Random Variables Example. Tossing a fair coin twice:

More information

Chapter 2 Random Variables

Chapter 2 Random Variables Stochastic Processes Chapter 2 Random Variables Prof. Jernan Juang Dept. of Engineering Science National Cheng Kung University Prof. Chun-Hung Liu Dept. of Electrical and Computer Eng. National Chiao Tung

More information

Preliminary statistics

Preliminary statistics 1 Preliminary statistics The solution of a geophysical inverse problem can be obtained by a combination of information from observed data, the theoretical relation between data and earth parameters (models),

More information

Chapter 2. Random Variable. Define single random variables in terms of their PDF and CDF, and calculate moments such as the mean and variance.

Chapter 2. Random Variable. Define single random variables in terms of their PDF and CDF, and calculate moments such as the mean and variance. Chapter 2 Random Variable CLO2 Define single random variables in terms of their PDF and CDF, and calculate moments such as the mean and variance. 1 1. Introduction In Chapter 1, we introduced the concept

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

Fundamentals of Digital Commun. Ch. 4: Random Variables and Random Processes

Fundamentals of Digital Commun. Ch. 4: Random Variables and Random Processes Fundamentals of Digital Commun. Ch. 4: Random Variables and Random Processes Klaus Witrisal witrisal@tugraz.at Signal Processing and Speech Communication Laboratory www.spsc.tugraz.at Graz University of

More information

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University Chapter 3, 4 Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University ENCS6161 p.1/47 The Notion of a Random Variable A random variable X is a function that assigns a real

More information

Chapter 3: Random Variables 1

Chapter 3: Random Variables 1 Chapter 3: Random Variables 1 Yunghsiang S. Han Graduate Institute of Communication Engineering, National Taipei University Taiwan E-mail: yshan@mail.ntpu.edu.tw 1 Modified from the lecture notes by Prof.

More information

BASICS OF PROBABILITY

BASICS OF PROBABILITY October 10, 2018 BASICS OF PROBABILITY Randomness, sample space and probability Probability is concerned with random experiments. That is, an experiment, the outcome of which cannot be predicted with certainty,

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Basics on Probability. Jingrui He 09/11/2007

Basics on Probability. Jingrui He 09/11/2007 Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability

More information

L2: Review of probability and statistics

L2: Review of probability and statistics Probability L2: Review of probability and statistics Definition of probability Axioms and properties Conditional probability Bayes theorem Random variables Definition of a random variable Cumulative distribution

More information

where r n = dn+1 x(t)

where r n = dn+1 x(t) Random Variables Overview Probability Random variables Transforms of pdfs Moments and cumulants Useful distributions Random vectors Linear transformations of random vectors The multivariate normal distribution

More information

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES Contents 1. Continuous random variables 2. Examples 3. Expected values 4. Joint distributions

More information

Northwestern University Department of Electrical Engineering and Computer Science

Northwestern University Department of Electrical Engineering and Computer Science Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability

More information

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl. E X A M Course code: Course name: Number of pages incl. front page: 6 MA430-G Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours Resources allowed: Notes: Pocket calculator,

More information

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Fourier and Stats / Astro Stats and Measurement : Stats Notes Fourier and Stats / Astro Stats and Measurement : Stats Notes Andy Lawrence, University of Edinburgh Autumn 2013 1 Probabilities, distributions, and errors Laplace once said Probability theory is nothing

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Lecture 11. Probability Theory: an Overveiw

Lecture 11. Probability Theory: an Overveiw Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the

More information

Sample Spaces, Random Variables

Sample Spaces, Random Variables Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Statistics 100A Homework 5 Solutions

Statistics 100A Homework 5 Solutions Chapter 5 Statistics 1A Homework 5 Solutions Ryan Rosario 1. Let X be a random variable with probability density function a What is the value of c? fx { c1 x 1 < x < 1 otherwise We know that for fx to

More information

1.1 Review of Probability Theory

1.1 Review of Probability Theory 1.1 Review of Probability Theory Angela Peace Biomathemtics II MATH 5355 Spring 2017 Lecture notes follow: Allen, Linda JS. An introduction to stochastic processes with applications to biology. CRC Press,

More information

1 Presessional Probability

1 Presessional Probability 1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional

More information

3 Operations on One Random Variable - Expectation

3 Operations on One Random Variable - Expectation 3 Operations on One Random Variable - Expectation 3.0 INTRODUCTION operations on a random variable Most of these operations are based on a single concept expectation. Even a probability of an event can

More information

Review of probability

Review of probability Review of probability Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts definition of probability random variables

More information

Multivariate random variables

Multivariate random variables Multivariate random variables DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Joint distributions Tool to characterize several

More information

CME 106: Review Probability theory

CME 106: Review Probability theory : Probability theory Sven Schmit April 3, 2015 1 Overview In the first half of the course, we covered topics from probability theory. The difference between statistics and probability theory is the following:

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University Lecture 19 Modeling Topics plan: Modeling (linear/non- linear least squares) Bayesian inference Bayesian approaches to spectral esbmabon;

More information

Appendix A : Introduction to Probability and stochastic processes

Appendix A : Introduction to Probability and stochastic processes A-1 Mathematical methods in communication July 5th, 2009 Appendix A : Introduction to Probability and stochastic processes Lecturer: Haim Permuter Scribe: Shai Shapira and Uri Livnat The probability of

More information

Continuous Random Variables

Continuous Random Variables 1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables

More information

Lecture 2. Spring Quarter Statistical Optics. Lecture 2. Characteristic Functions. Transformation of RVs. Sums of RVs

Lecture 2. Spring Quarter Statistical Optics. Lecture 2. Characteristic Functions. Transformation of RVs. Sums of RVs s of Spring Quarter 2018 ECE244a - Spring 2018 1 Function s of The characteristic function is the Fourier transform of the pdf (note Goodman and Papen have different notation) C x(ω) = e iωx = = f x(x)e

More information

STAT 414: Introduction to Probability Theory

STAT 414: Introduction to Probability Theory STAT 414: Introduction to Probability Theory Spring 2016; Homework Assignments Latest updated on April 29, 2016 HW1 (Due on Jan. 21) Chapter 1 Problems 1, 8, 9, 10, 11, 18, 19, 26, 28, 30 Theoretical Exercises

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

Lecture 6 Basic Probability

Lecture 6 Basic Probability Lecture 6: Basic Probability 1 of 17 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 6 Basic Probability Probability spaces A mathematical setup behind a probabilistic

More information

Chapter 3: Random Variables 1

Chapter 3: Random Variables 1 Chapter 3: Random Variables 1 Yunghsiang S. Han Graduate Institute of Communication Engineering, National Taipei University Taiwan E-mail: yshan@mail.ntpu.edu.tw 1 Modified from the lecture notes by Prof.

More information

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section

More information

MTH739U/P: Topics in Scientific Computing Autumn 2016 Week 6

MTH739U/P: Topics in Scientific Computing Autumn 2016 Week 6 MTH739U/P: Topics in Scientific Computing Autumn 16 Week 6 4.5 Generic algorithms for non-uniform variates We have seen that sampling from a uniform distribution in [, 1] is a relatively straightforward

More information

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

STAT2201. Analysis of Engineering & Scientific Data. Unit 3 STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

Lecture 2: Review of Probability

Lecture 2: Review of Probability Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................

More information

Math 416 Lecture 3. The average or mean or expected value of x 1, x 2, x 3,..., x n is

Math 416 Lecture 3. The average or mean or expected value of x 1, x 2, x 3,..., x n is Math 416 Lecture 3 Expected values The average or mean or expected value of x 1, x 2, x 3,..., x n is x 1 x 2... x n n x 1 1 n x 2 1 n... x n 1 n 1 n x i p x i where p x i 1 n is the probability of x i

More information

Introduction to Probability and Stocastic Processes - Part I

Introduction to Probability and Stocastic Processes - Part I Introduction to Probability and Stocastic Processes - Part I Lecture 2 Henrik Vie Christensen vie@control.auc.dk Department of Control Engineering Institute of Electronic Systems Aalborg University Denmark

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

1: PROBABILITY REVIEW

1: PROBABILITY REVIEW 1: PROBABILITY REVIEW Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 1: Probability Review 1 / 56 Outline We will review the following

More information

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008 Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008 1 Review We saw some basic metrics that helped us characterize

More information

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2017 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

More information

Fundamental Tools - Probability Theory II

Fundamental Tools - Probability Theory II Fundamental Tools - Probability Theory II MSc Financial Mathematics The University of Warwick September 29, 2015 MSc Financial Mathematics Fundamental Tools - Probability Theory II 1 / 22 Measurable random

More information

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables Chapter 2 Some Basic Probability Concepts 2.1 Experiments, Outcomes and Random Variables A random variable is a variable whose value is unknown until it is observed. The value of a random variable results

More information

Random Variables. P(x) = P[X(e)] = P(e). (1)

Random Variables. P(x) = P[X(e)] = P(e). (1) Random Variables Random variable (discrete or continuous) is used to derive the output statistical properties of a system whose input is a random variable or random in nature. Definition Consider an experiment

More information

2 Functions of random variables

2 Functions of random variables 2 Functions of random variables A basic statistical model for sample data is a collection of random variables X 1,..., X n. The data are summarised in terms of certain sample statistics, calculated as

More information

3F1 Random Processes Examples Paper (for all 6 lectures)

3F1 Random Processes Examples Paper (for all 6 lectures) 3F Random Processes Examples Paper (for all 6 lectures). Three factories make the same electrical component. Factory A supplies half of the total number of components to the central depot, while factories

More information

1 Random variables and distributions

1 Random variables and distributions Random variables and distributions In this chapter we consider real valued functions, called random variables, defined on the sample space. X : S R X The set of possible values of X is denoted by the set

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

for valid PSD. PART B (Answer all five units, 5 X 10 = 50 Marks) UNIT I

for valid PSD. PART B (Answer all five units, 5 X 10 = 50 Marks) UNIT I Code: 15A04304 R15 B.Tech II Year I Semester (R15) Regular Examinations November/December 016 PROBABILITY THEY & STOCHASTIC PROCESSES (Electronics and Communication Engineering) Time: 3 hours Max. Marks:

More information

1 Random Variable: Topics

1 Random Variable: Topics Note: Handouts DO NOT replace the book. In most cases, they only provide a guideline on topics and an intuitive feel. 1 Random Variable: Topics Chap 2, 2.1-2.4 and Chap 3, 3.1-3.3 What is a random variable?

More information

Math-Stat-491-Fall2014-Notes-I

Math-Stat-491-Fall2014-Notes-I Math-Stat-491-Fall2014-Notes-I Hariharan Narayanan October 2, 2014 1 Introduction This writeup is intended to supplement material in the prescribed texts: Introduction to Probability Models, 10th Edition,

More information

A review of probability theory

A review of probability theory 1 A review of probability theory In this book we will study dynamical systems driven by noise. Noise is something that changes randomly with time, and quantities that do this are called stochastic processes.

More information

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2015 http://www.astro.cornell.edu/~cordes/a6523 Lecture 4 See web page later tomorrow Searching for Monochromatic Signals

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

Statistical techniques for data analysis in Cosmology

Statistical techniques for data analysis in Cosmology Statistical techniques for data analysis in Cosmology arxiv:0712.3028; arxiv:0911.3105 Numerical recipes (the bible ) Licia Verde ICREA & ICC UB-IEEC http://icc.ub.edu/~liciaverde outline Lecture 1: Introduction

More information

STAT 418: Probability and Stochastic Processes

STAT 418: Probability and Stochastic Processes STAT 418: Probability and Stochastic Processes Spring 2016; Homework Assignments Latest updated on April 29, 2016 HW1 (Due on Jan. 21) Chapter 1 Problems 1, 8, 9, 10, 11, 18, 19, 26, 28, 30 Theoretical

More information

Order Statistics and Distributions

Order Statistics and Distributions Order Statistics and Distributions 1 Some Preliminary Comments and Ideas In this section we consider a random sample X 1, X 2,..., X n common continuous distribution function F and probability density

More information

Introduction to Probability Theory

Introduction to Probability Theory Introduction to Probability Theory Ping Yu Department of Economics University of Hong Kong Ping Yu (HKU) Probability 1 / 39 Foundations 1 Foundations 2 Random Variables 3 Expectation 4 Multivariate Random

More information

4 Lecture 4 Notes: Introduction to Probability. Probability Rules. Independence and Conditional Probability. Bayes Theorem. Risk and Odds Ratio

4 Lecture 4 Notes: Introduction to Probability. Probability Rules. Independence and Conditional Probability. Bayes Theorem. Risk and Odds Ratio 4 Lecture 4 Notes: Introduction to Probability. Probability Rules. Independence and Conditional Probability. Bayes Theorem. Risk and Odds Ratio Wrong is right. Thelonious Monk 4.1 Three Definitions of

More information

STAT Chapter 5 Continuous Distributions

STAT Chapter 5 Continuous Distributions STAT 270 - Chapter 5 Continuous Distributions June 27, 2012 Shirin Golchi () STAT270 June 27, 2012 1 / 59 Continuous rv s Definition: X is a continuous rv if it takes values in an interval, i.e., range

More information

Brief Review of Probability

Brief Review of Probability Brief Review of Probability Nuno Vasconcelos (Ken Kreutz-Delgado) ECE Department, UCSD Probability Probability theory is a mathematical language to deal with processes or experiments that are non-deterministic

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

STAT 712 MATHEMATICAL STATISTICS I

STAT 712 MATHEMATICAL STATISTICS I STAT 72 MATHEMATICAL STATISTICS I Fall 207 Lecture Notes Joshua M. Tebbs Department of Statistics University of South Carolina c by Joshua M. Tebbs TABLE OF CONTENTS Contents Probability Theory. Set Theory......................................2

More information

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Communication Theory II

Communication Theory II Communication Theory II Lecture 5: Review on Probability Theory Ahmed Elnakib, PhD Assistant Professor, Mansoura University, Egypt Febraury 22 th, 2015 1 Lecture Outlines o Review on probability theory

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

ECE353: Probability and Random Processes. Lecture 7 -Continuous Random Variable

ECE353: Probability and Random Processes. Lecture 7 -Continuous Random Variable ECE353: Probability and Random Processes Lecture 7 -Continuous Random Variable Xiao Fu School of Electrical Engineering and Computer Science Oregon State University E-mail: xiao.fu@oregonstate.edu Continuous

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Homework 2. Spring 2019 (Due Thursday February 7)

Homework 2. Spring 2019 (Due Thursday February 7) ECE 302: Probabilistic Methods in Electrical and Computer Engineering Spring 2019 Instructor: Prof. A. R. Reibman Homework 2 Spring 2019 (Due Thursday February 7) Homework is due on Thursday February 7

More information

Dynamic Programming Lecture #4

Dynamic Programming Lecture #4 Dynamic Programming Lecture #4 Outline: Probability Review Probability space Conditional probability Total probability Bayes rule Independent events Conditional independence Mutual independence Probability

More information

Lecture 1: Basics of Probability

Lecture 1: Basics of Probability Lecture 1: Basics of Probability (Luise-Vitetta, Chapter 8) Why probability in data science? Data acquisition is noisy Sampling/quantization external factors: If you record your voice saying machine learning

More information

1 Variance of a Random Variable

1 Variance of a Random Variable Indian Institute of Technology Bombay Department of Electrical Engineering Handout 14 EE 325 Probability and Random Processes Lecture Notes 9 August 28, 2014 1 Variance of a Random Variable The expectation

More information

Review: mostly probability and some statistics

Review: mostly probability and some statistics Review: mostly probability and some statistics C2 1 Content robability (should know already) Axioms and properties Conditional probability and independence Law of Total probability and Bayes theorem Random

More information

Set Theory Digression

Set Theory Digression 1 Introduction to Probability 1.1 Basic Rules of Probability Set Theory Digression A set is defined as any collection of objects, which are called points or elements. The biggest possible collection of

More information