Bayesian Inference and Decision Theory
|
|
- Terence Lawson
- 5 years ago
- Views:
Transcription
1 Bayesian Inference and Decision Theory Instructor: Kathryn Blackmond Laskey Room 2214 ENGR (703) Office Hours: Thursday 4:00-6:00 PM, or by appointment Spring 2018 Unit 2: Random Variables, Parametric Models, and Inference from Observation Unit 2 (v4) - 1 -
2 Learning Objectives for Unit 2 Define and apply the following: Event Random variable (univariate and multivariate) Joint, conditional and marginal distributions Independence; conditional independence Mass and density functions; cumulative distribution functions Measures of central tendency and spread Be familiar with common parametric statistical models Continuous distributions:» Normal, Gamma, Exponential, Uniform, Beta, Dirichlet Discrete distributions:» Bernoulli, Binomial, Multinomial, Poisson, Negative Binomial Use Bayes rule to find the posterior distribution of a parameter given a set of observations State de Finetti's Theorem and its relationship to the concept of "true" probabilities Use R to do Bayesian inference with discrete random variables Unit 2 (v4) - 2 -
3 Events, Partitions and Probability Distributions Some definitions relating to mathematical models of situations involving uncertainty: Sample space - a set S consisting of all possible elementary outcomes Event - a subset E Ì S Partition - a finite or infinite collection {H 1, H 2, } events such that H i Ç H j = Æ if i j and È i H i = S The sample space S completely characterizes the outcome to the finest level of detail needed for the model We say an event E Ì S occurs if the actual outcome s is an element of E If {H 1, H 2, } is a partition of S, then exactly one of the events H i occurs A probability distribution is a function from events to real numbers satisfying the axioms of probability If E is an event and {H 1, H 2, } is a partition, then the axioms imply: Law of total probability: Bayes Rule: k P(E) = P(E H k )P(H k ) P(H k E) = P(E H k)p(h k ) P(E) = j P(E H k )P(H k ) Note: This is the standard definition. Hoff s definition (p. 15 of Hoff) is non-standard. P(E H j )P(H j ) Unit 2 (v4) - 3 -
4 Random Variables Random variable (RV) - mathematical representation for uncertain outcome RV has set of possible values (also called states or outcomes) Categorical RVs - possible values are category labels (e.g., rainy/cloudy/sunny) Ordinal RVs - possible values are qualitatively ordered (e.g., high/medium/low) Numerical RVs - possible values are numbers» Discrete - values drawn from finite or countable set (e.g., 0, 1, 2, 3, )» Continuous - values drawn from continuous set (e.g., real numbers) Probability distribution quantifies likelihood For a Bayesian agent the probability distribution: Summarizes all information needed for inference or decision Is always conditional on the agent s current knowledge Changes with new information according to Bayes Rule Unit 2 (v4) - 4 -
5 Random Variable: Formal Definition A probability space (S, A, P) consists of : A set S of elementary outcomes, called the sample space A s-algebra A over S» Collection of events (subsets of S) that contains Æ and S» Closed under countable intersection, union and complementation» For discrete RV we typically use s-algebra of all subsets; for continuous RV we use s-algebra generated by intervals A probability function P mapping events in A to numbers between 0 and 1, such that probability axioms are satisfied A random variable X is a function from S to set X of possible values A random variable maps element of sample space to outcome X(s) = x» The probability distribution on S induces a probability distribution on X» P(Y) is the probability of the subset A of S that maps onto the subset Y of X» We usually write X and not X(s)» Often the sample space is left implicit (this can cause confusion) S A x=x(s) Y X Notation convention: Some statisticians use uppercase letters to denote random variables and lowercase letters to denote values of sample space; others use only lowercase letters Unit 2 (v4) - 5 -
6 RV from Frequentist View: Example Question: does randomly chosen patient have disease? Sample space S = {s 1, s 2, } represents population of patients arriving at medical clinic 30% of the patients in this population have the disease 95% of the patients who have the disease test positive 85% of the patients who do not have the disease test negative Random variables: X maps randomly drawn patient s to X(S) = 0 if patient does not have disease and X(S) = 1 if patient has disease Y maps randomly drawn patient S to Y(S) = 0 if patient s test result is negative and Y(S) = 1 if patient s test result is positive Usually we suppress the argument and write (X, Y) for the two-component random variable representing the condition and test result of a randomly drawn patient Pr(X=1) = 0.3 Pr(Y=1 X=1) = 0.95 Pr(Y=0 X=0) = 0.85 Pr(X=1 Y=1) = 0.73 (by Bayes rule) These probabilities refer to a sampling process and not to any particular patient Unit 2 (v4) - 6 -
7 RV from Subjectivist View: Example Question: does a specific patient (e.g., George Smith) have disease? Sample space S = {s 1, s 2, } represents possible worlds (or ways the world could be) In 30% of these possible worlds, George has the disease In 95% of these possible worlds in which George has the disease, the test is positive In 85% of the possible worlds in which George does not have the disease, the test is negative Random variables: X maps possible world s to X(s) = 0 if George does not have disease and X(s) = 1 if George has disease in this world Y maps possible world s to Y(s) = 0 if George s test result is negative and Y(s) = 1 if George s test result is positive in this world Usually we suppress the argument and write (X, Y) for the two-component random variable representing George s condition and test result Pr(X=1) = 0.3 Pr(Y=1 X=1) = 0.95 Pr(Y=0 X=0) = 0.85 Pr(X=1 Y=1) = 0.73 (by Bayes rule) These probabilities refer to our beliefs about George s specific case Unit 2 (v4) - 7 -
8 Comparison Frequentists say a probability distribution arises from some random process on the sample space (such as random selection) Pr(X=1) = 0.3 means: If patients are chosen at random from the population of clinic patients, 30% of them will have the disease Pr(X=1 Y=1) = 0.73 means If patients are chosen at random from the population of clinic patients who test positive, 73% of them will have the disease The probability George Smith has the disease is either 1 or 0, depending on whether he does or does not have the disease Subjectivists say a probability distribution represents someone s beliefs about the world Pr(X=1) = 0.3 means: If we have no information except that George walked into the clinic, then our degree of belief is 30% that he has the disease Pr(X=1 Y=1) = 0.73 means If we learn that George tested positive, our belief increases to 73% that he has the disease The probability that George has the disease depends on our information about George. If we know only that he went to the clinic, our probability is 30%. If we also learn that he tested positive, our probability becomes 73%. If we receive a definitive diagnosis, then the probability he has the disease is either 1 or 0, depending on whether he does or does not have the disease. Unit 2 (v4) - 8 -
9 Frequentists and Subjectivists: Random Sampling Suppose we sample a patient at random from a population in which: 30% of the patients in this population have the disease 95% of the patients who have the disease test positive 85% of the patients who do not have the disease test negative Subjectivist and frequentist will agree that: Pr(X=1) = 0.3 Pr(Y=1 X=1) = 0.95 See example from Unit 1 p. 14 Pr(X=1 Y=1) = A frequentist will say: These probabilities apply to the population and the sampling process There is no such thing as the probability that a particular patient has the disease A subjectivist will say: These probabilities apply to any particular patient for whom the information given is our only relevant information about the patient If we learn something new (e.g., George s father had the disease) then we need to update our probabilities to reflect the new knowledge Frequentists apply Bayesian methods when there is a frequency interpretation of the probabilities In practice, Bayesians and frequentists are often in close agreement on the practical implications of statistical analysis Unit 2 (v4) - 9 -
10 Mass, Density and Cumulative Distribution Functions Probability mass function (pmf) - Maps each possible value of a discrete random variable to its probability Probability density function (pdf) - Maps each possible value of a continuous random variable to a positive number representing its likelihood relative to other possible values Cumulative distribution function (cdf) - Value of cdf at x is probability that the random variable is less than or equal to x Unit 2 (v4)
11 Discrete Random Variables: pmf and cdf Probability mass function f(x) is probability that random variable X is equal to x f (x) = 1 x Cumulative Distribution Function Cumulative distribution function F(x) is probability that random variable X is less than or equal to x Increases from zero (at - ) to 1 (at + ) Step function has jump at each value Height of step at possible value x is equal to value of pmf at x F(x) = x' x f (x ') F(x) x Unit 2 (v4)
12 Continuous Random Variables: pdf and cdf Probability Density f(x) versus x Probability density function Each individual value has probability zero pdf is derivative of cdf: f(x) = df(x)/dx f(x)dx is approximately equal to the probability that X lies in [x-dx/2, x+dx/2] f (x)dx = 1 x Area under f(x) between a and b is equal to difference F(b)-F(a) Cumulative distribution function cdf is integral of pdf: F(x) = f (x ')dx ' Probability that X lies in interval [a,b] is difference of cdf values; also integral of pdf: P(a < X b) = F(b) F(a) = b a x' x f (x)dx Probability Density F(x) versus x { Unit 2 (v4)
13 Examples: Shape of pdf f(x) versus x F(x) versus x Symmetric, unimodal distribution f(x) versus x F(x) versus x Skewed, unimodal distribution f(x) versus x F(x) versus x Bimodal distribution Unit 2 (v4)
14 Generalized Probability Density Function (gpdf) It is useful to have common notation for density and mass functions Integral symbol is used to mean integration when distribution is continuous and summation when distribution is discrete Remember that an integral is a limit of sums gpdf f(x) is either mass or density function: ò x ò x g(x)f(x)d µ (x) º åg(x)f(x) g(x)f(x)d µ (x) º x ò x g(x)f(x) dx if X is discrete if X is continuous µ(x) stands for measure (counting measure for discrete random variables and uniform measure for continuous random variables) Unit 2 (v4)
15 Central Tendency Measure of central tendency: Summary that indicates the center of the distribution Always lies between largest and smallest value Mean (expected value E[X]): E[X] = xf (x)dµ(x) Weight each observation by its probability of occurring In skewed distributions the mean is pulled toward the skewed tail Sample mean of a random sample converges to the distribution mean as the sample size becomes large Median (50th percentile x 0.5 ): P(X x 0.5 ) = 0.5 Half of the observations are larger, half smaller than the median Less influenced by skewness and outliers than the mean Sample median of a random sample of observations converges to population median as sample size becomes large Generalization to multivariate distributions? Mode (x max ): f (x max ) f (x) for all x Most probable value Distributions may have multiple modes (multiple peaks in pmf or pdf) In multimodal distributions the mean and median don t say much about the center of the distribution; a single numerical summary may not be appropriate x Unit 2 (v4)
16 Spread Measure of spread: Summary that indicates how spread out a distribution is, or how far from the center a typical observation is likely to fall Variance and standard deviation Variance is the expected value of the squared V[X] = difference between an observation and the mean Standard deviation is the square root of the variance Standard deviation is in same units as the variable itself Median absolute deviation (MAD) Median of the absolute value of the difference between an observation and the median Less influenced by outliers than the standard deviation Credible interval (Bayesian confidence interval) Level-a credible interval [r,s]:» probability that the RV lies between r and s is a Kinds of level-a credible intervals:» one-sided credible intervals ( x E[X] ) 2 f (x)dµ(x)» two-sided credible intervals with equal probability outside interval on either side» two-sided highest-posterior density intervals Which to use depends on the purpose x SD[X] = V[X] P( X x 0.5 x MAD ) = 0.5 P( r X s) = α Unit 2 (v4)
17 Different Varieties of Credible Interval Highest posterior density (HPD) Two-sided symmetric tail areas One-sided Unit 2 (v4)
18 Reporting Measures of Spread No presentation of statistical results is complete without a measure of spread! Policy makers sometimes resist presentations that include variability Need to educate decision makers on why spread is important Need to present variability in ways that make policy implications clear» Forecast is $32 per barrel with a 90% credible interval of [25, 48] may be hard to grasp» Decision makers can more easily grasp: Best guess forecast is $32 per barrel. The price is very unlikely to drop below $25 per barrel, but there are many low-probability, high-impact scenarios which could result in a significant increase in price above that level, to as much as $48 per barrel, especially if some of the scenarios are described Graphical representations of spread are important Taken from Hannah Laskey s science fair entry, February 2011 Unit 2 (v4)
19 Theoretical and Sample Summary Statistics Expected value or mean of a distribution: Variance of a distribution: Sample mean for sample X 1,,X n : Sample variance for sample X 1,,X n : In some definitions, denominator is n instead of n-1 Dividing by n-1 gives unbiased estimate of distribution mean If sample is iid from a normal distribution then dividing by n gives maximum likelihood estimate (MLE) Quantile function: Var[X] = How does this function behave for a discrete distribution?» Between jumps of F(x)?» At a jump of F(x)? There are several different definitions of quantiles for discrete distributions and samples; this is one of the most commonly used Median is 0.5 quantile (or 50th percentile) x x X = 1 n F 1 (q) = min{ F(x) q} E[X] = x xf (x)dµ(x) ( x E[X] ) 2 f (x)dµ(x) n i=1 X i S 2 = 1 n 1 n i=1 ( X i X) 2 Unit 2 (v4)
20 Data Summary in R Data set on 44 babies born in Brisbane, Australia on 21 December 1997 Description: Dataset provided as part of UsingR add-on package in R Screenshot of Console from R Session R reads the data into a data frame called babyboom - R object for representing a data set - List of equal-length vectors each with a name - babyboom$wt is the 3 rd column containing weights of babies See summary is one of the many R methods for data frames Unit 2 (v4)
21 Graphical Summaries in R There are many ways to customize R graphics Box extends from lower to upper quartile, with median shown as line across box. Upper whisker extends to largest observation (upper quartile x interquartile range) Lower whisker extends to smallest observation (lower quartile x interquartile range) Unit 2 (v4)
22 Multivariate Distributions A random vector X is a vector of random variables Underscore denotes a vector Underscore may be omitted many authors do not have special notation for multivariate distributions Joint distribution function cdf F(x)=P(X x) gpdf f(x) (density if continuous, mass function if discrete) Expected value vector Covariance matrix E[X] = Conditional independence assumptions are often used to simplify specification and inference for multivariate distributions R n Cov[X] = xf (x)dµ(x) ( X E[X] )( X E[X] ) T f (x)dµ(x) R n Unit 2 (v4)
23 Covariance and Correlation The covariance matrix measures how random quantities vary together The (i,j) th off-diagonal element is called Cov(X i,x j ) or covariance of X i and X j, Cov(X i,x j ) is equal to the product of the standard deviations of X i and X j if they are perfectly linearly related Cov(X i,x j ) = 0 if X i and X j are independent The correlation matrix is The diagonal elements are equal to 1 The (i,j) th off-diagonal element Cov(X i,x j )/(SD(X i ) SD(X j )) is the covariance divided by the product of the standard deviations The off-diagonal elements are real numbers between -1 and 1» Zero correlation means RVs are independent» The closer the correlation is to 1 or -1, the more nearly linearly related the RVs are Unit 2 (v4)
24 Example: Joint Density Functions f(x,y)=f(x)f(y) f(x,y)=f(x)f(y x) Unit 2 (v4)
25 Example: Scatterplot of Random Sample from Joint Distribution Unit 2 (v4)
26 Marginal and Conditional Distributions If (X,Y) has joint gpdf f(x,y): The marginal gpdf of X is f (x) = f (x, y)dµ(y) (y is marginalized out) y f (x, y) The conditional gpdf of X given Y is f (x y) = f (y) The joint gpdf of X and Y can be factored into conditional and marginal gpdfs: f (x, y) = f (x y) f (y) = f (y x) f (x) The joint gpdf for n random variables (X 1,, X n ) can be written as: f (x 1, x n ) = f (x 1 ) f (x 2 x 1 ) f (x 3 x 1, x 2 )! f (x n x 1, x n 1 ) n = f (x i x 1, x i 1 ) i=1 Unit 2 (v4)
27 Distributions in R Each R distribution has four functions, invoked by adding a prefix to the distribution s base name: p for probability the cumulative distribution function (cdf) q for quantile the inverse cdf d for density the density or mass function r for random generates random numbers from the distribution Examples: rpois, dgamma Information on distributions in R: Source: Functions for the Beta-Binomial distribution are available in the VGAM package Unit 2 (v4)
28 Independence and Conditional Independence X and Y are independent if the conditional distribution of X given Y does not depend on Y When X and Y are independent, the joint density (or mass) function is equal to the product of the marginal density (or mass) functions f(x,y) = f(x)f(y) X and Y are conditionally independent given Z if: f(x,y,z) = f(x z)f(y z)f(z) Graphs provide a powerful tool for visualizing and formally modeling dependencies between random variables Random variables (RVs) are represented by nodes Direct dependencies are represented by edges Absence of an edge means no direct dependence X Y P(X,Y ) = P(X)P(Y ) General form is f(x,y,z) = f(x y,z)f(y z)f(z) X Y W P(X,Y,Z,W ) = P(X)P(Y )P(W X,Y )P(Z W ) Z Unit 2 (v4)
29 Independent and Identically Distributed Observations Statistical models often assume the observations are a random sample from a parameterized distribution Mathematically, this is represented as independent and identically distributed (iid) conditional on the parameter q The density function for an iid sample X 1,,X n is written as a product of factors: f (x 1, x 2,, x n θ) = f (x 1 θ) f (x 2 θ)! f (x n θ) = n i=1 f (x i θ) X 1 Θ X 2 X 3 Θ X i i=1,,3 plate A plate represents repeated structure The RVs inside the plate are iid conditional on their parents In this example, the X i are iid given Θ Joint gpdf for (X,Θ): g(q)p i f(x i q) Conditional gpdf for x givenq: P i f(x i q) Unit 2 (v4)
30 Graphical Probability Models Graphical probability models provide a means of specifying complex multivariate distributions in compact form, visualizing dependence relationships, and computing results of inference In the past, statistical packages have been limited to models with simple random sampling Graphical models can represent more complex kinds of dependencies Software is becoming available to estimate parameters of these more complex models from data (such as JAGS / R) Birth rates Λ k for hospitals k=1,, n are sampled randomly from A A ~ h(α) a distribution with parameter A A Λ k Λ 1 Λ 2 Hourly births X ki at hospital k are sampled from a Poisson distribution with parameter Λ k Λ n Λ k ~ g(λ α) X ki i=1,,m k k=1,,n X X X 21 X 21 X 11 X 21 X X n1 X n1 n1 X ki ~ Poisson(λ k ) Unit 2 (v4)
31 Parametric Families of Distributions Much of statistics is based on constructing models of phenomena using parametric families of distributions Simple models typically assume observations X 1,, X N are sampled randomly from a distribution with gpdf f(x Q)» The symbol Q denotes a parameter» Functional form f(x Q) is known but value of parameter is unknown Parameter may be a number Q or a vector Q = (Q 1,...,Q p ) Canonical problems: Use the sample to construct an estimate (point or interval) for the parameter Test a hypothesis about the value of the parameter, e.g.:» Is Q equal greater than a particular value q 0?» Are two parameters Q 1 and Q 2 equal to each other? Is the functional form f(x Q) adequate as a model for the phenomenon generating the observations? Predict features (e.g., mean) of a future sample X 1+N,, X M+N Hierarchical (or multi-level) Bayesian models Retain simplicity of parametric models with standard functional form Build complex models out of simple building blocks Many distributions have multiple parameterizations in common use and it is important not to confuse different parameterizations Unit 2 (v4)
32 Some Discrete Parametric Families (p. 1) Binomial distribution with number of trials n and success probability π: Sample space: Non-negative integers n pmf: f (x n,π ) = π x (1 π ) n x x E[X n,π] = nπ; Var[X n, π] = nπ (1-π) Multinomial distribution with number of trials n and success probability π: Sample space: Vectors of non-negative integers pmf: n! f (x 1, x p n,π 1,,π p ) = x 1!!x p! E[X i n, π] = nπ i ; Var[X i n, π] = nπ i (1-π i ) Poisson distribution with rate λ: Sample space: Non-negative integers pmf: f (x λ) = e λ λ x x! E[X λ] = λ ; Var[X λ]= λ p i=1 π i x i Bernoulli distribution is a Binomial distribution with n=1 Unit 2 (v4)
33 Some Discrete Parametric Families (p. 2) Negative binomial distribution with size α and probability π: Sample space: Non-negative integers α + x 1 pmf: f (x α,π ) = α π α (1 π ) x E[X α,π ] = πα 1 π Var[X α,π ] = πα (1 π ) 2 Unit 2 (v4)
34 Some Continuous Parametric Families (p. 1) Gamma distribution with shape α and scale β: Sample space: Nonnegative real numbers 1 pdf: f (x α,β) = β α Γ(α) xα 1 e x/β E[X α,β] = αβ ; Var[X α,β]= αβ 2 Beta distribution with shape parameters α and β: Sample space: Real numbers between 0 and 1 Gamma distribution is sometimes parameterized with scale and rate r = 1/β Γ(α + β) pdf: f (x α,β) = Γ(α)Γ(β) xα-1 (1 x) β-1 αβ E[X α,β] = α/(α+β) ; Var[X α,β]= (α + β) 2 (α + β +1) Univariate normal distribution with mean µ and standard deviation σ: Sample space: Real numbers 1 pdf: f (x µ,σ) = 2πσ exp + 2 % x 1 i - µ (., 2' * / - & σ ) 0 E[X µ,σ] = µ ; Var[X µ,σ]= σ 2 Multivariate normal distribution with mean µ covariance matrix Σ: Sample space: Vectors of real numbers 1 pdf: f (x µ, Σ) = 2π Σ exp 1 2 ( x µ )'Σ 1 x µ E[X µ,σ] = µ ; Var[X µ,σ]= Σ Exponential distribution is a Gamma distribution with α=1 Chi-square distribution with δ degrees of freedom is a Gamma distribution with α=δ/2 and β=2 Uniform distribution is a Beta distribution with α=β=1 Gamma function { ( )} Γ(y) 0 u y 1 e u du Unit 2 (v4)
35 Some Continuous Parametric Families (p. 2) Dirichlet distribution with shape parameters α 1,α 2,,α p : Sample space: Real non-negative vectors summing to 1 pdf: f (x 1,, x p α 1,...,α p ) = Γ(α +!+α ) 1 p Γ(α 1 )!Γ(α p ) x 1 E[X i α 1,...,α p ] = α i Var[X α i α 1,...,α p ] = j j α 1-1!x p α p -1 α i (1 α i ) α j j +1 j ( ) 2 ( α ) j Dirichlet distribution is a multivariate generalization of the Beta distribution Wishart distribution with degrees of freedom α and scale matrix V, where V is a positive definite matrix of dimension p and α > p-1 Sample space: Positive definite p-dimensional matrices pdf: f X α, V = X (12324)/6 e 289(V:; X)/ V 1 6Γ 3 ( α 2 ) Γ 3 is the multivariate Gamma function tr is the trace function E X α, V = αv Wishart distribution is a multivariate generalization of Gamma and Chi-Square distributions Unit 2 (v4)
36 Sufficient Statistic A sufficient statistic is a data summary (function of sample of observations) such that the observations are independent of the parameter given the sufficient statistic Example: For a sample of n iid observations from a Poisson distribution, the sum of the observations is a sufficient statistic for the rate parameter Example: For a sample of n observations from a Bernoulli distribution, the total number of successes is a sufficient statistic for the success probability If observations X 1,, X N are sampled randomly from a distribution with gpdf f(x θ) and Y is sufficient for the parameter θ, then the posterior distribution f(θ X) depends on the observations only through the sufficient statistic Fisher s factorization theorem: T(X) is sufficient for θ if and only if the conditional probability distribution f(x θ) can be factored as: f(x θ) = h(x)g θ (T(x)) Unit 2 (v4)
37 Is a Parametric Distribution a Good Model? Before applying a parametric model, we should perform exploratory analysis to evaluate whether the model is an adequate representation of the data A q-q plot is a commonly used diagnostic tool Plot quantiles of data distribution against quantiles of theoretical distribution If theoretical distribution is correct, the plot should look approximately like the plot of y=x Problem: what about unknown parameters? Solution: We can estimate parameters from data Solution: in the case of a location-scale family, we can plot data quantiles against standard distribution (location parameter = 0 and scale parameter = 1)» If theoretical distribution is correct the plot should look approximately like a straight line» Slope and intercept of line are determined by scale and location parameters Another diagnostic tool is to compare empirical and theoretical counts for discrete RV (or discretized continuous RV) What other checks for model fit have you encountered? In your homework I expect you to assess whether the parametric model I give you is a good one (sometimes it won t be!) Unit 2 (v4)
38 Are Birth Weights Normally Distributed? Data on birth weights of babies born in a Brisbane hospital on December 18, 1997 Source: Unit 2 (v4)
39 Birth Weights of Boys and Girls Unit 2 (v4)
40 Are Times Between Births Exponentially Distributed? The data: times between births of babies born in a Brisbane hospital on December 18, 1997 R notes: ppoints function computes probabilities for evaluating quantiles diff function computes lagged differences lines function adds lines to a plot qexp function computes exponential quantiles Unit 2 (v4)
41 Looking for Temporal Patterns (1 of 2) Time Series Plot of Birth Intervals Time Until Next Birth (minutes) If birth times are iid, then we should not see any patterns (such as more births at some times of day) Do you see any patterns in this plot? Time of Birth (minutes from midnight) plot(birthtim[1:43],birthinterval,main="time Series Plot of Birth Intervals", ylab="time Until Next Birth (minutes)",xlab="time of Birth (minutes from midnight)") Unit 2 (v4)
42 Looking for Temporal Patterns (2 of 2) Lag Plot of Birth Intervals birthinterval[2:43] The lag plot looks for a dependency between successive birth intervals (such as long intervals being followed by long intervals) Do you see any pattern in this plot? birthinterval[1:42] plot(birthinterval[1:42],birthinterval[2:43],main="lag Plot of Birth Intervals") Unit 2 (v4)
43 Do Births Per Hour Follow a Poisson Distribution? Comparing empirical to theoretical counts is a useful tool for evaluating fit (if the expected counts are not too small) Births Empirical Count Expected Count TOTAL Unit 2 (v4)
44 Frequency Plot is Misleading when Expected Counts are Too Small Blue bars: empirical counts from sample of size 35 from Poisson distribution with mean 20 (in bins of 1 and 3 time units) Pink bars: expected counts for sample of size 35 from Poisson distribution with mean 20 (in bins of 1 and 3 time units) When expected counts are small, plots of observed and expected counts will not look similar Common rule of thumb: choose bin size so most bins have expected count of at least 5 Unit 2 (v4)
45 Inferring the Value of a Parameter Often we model a set of observations as independent trials from a probability distribution with unknown parameter f(x 1,,x n q) = f(x 1 q) f(x n q) The joint gpdf of x given q, viewed as a function of q, is called the likelihood function Likelihood principle: The likelihood function captures everything about the observations needed to draw inferences To draw inferences about q, a Bayesian statistician must also specify a prior distribution g(q) We condition on the sample to obtain the posterior distribution using Bayes Rule: g(θ x 1, x n ) = n i=1 n i=1 f (x i θ) g(θ) f (x i θ) g(θ)dµ(θ) f(x) is called the marginal likelihood of the observations x=(x 1,,x n ) Note: the X i are independent given q, but are not independent after integrating over q = f (x θ)g(θ) f (x) Θ X i i=1,,n Unit 2 (v4)
46 Example: Transmission Errors Number of transmission errors per hour is distributed as Poisson with parameter λ f (x λ) = e λ λ x x! Data on previous system established error rate as 1.6 errors per hour New system has design goal of cutting this rate in half to 0.8 errors per hour Observe new system for 6 one-hour periods: Data: 1, 0, 1, 2, 1, 0 Questions: Have we met the design goal? Does the new system improve the error rate? Unit 2 (v4)
47 The Prior Distribution (discretized) Error rate can be any positive real number We use expert judgment to define prior distribution on a discrete set of values (later we will revisit this problem with a continuous prior distribution) Experts familiar with the new system design said: Meeting the design goal of 0.8 errors per hour is about a proposition. The chance of making things worse than current rate of 1.6 errors per hour is small but not negligible Expert agrees that the discretized distribution shown here is a good reflection of his prior knowledge Expected value is about 1.0 Distribution is heavy tailed on the right P(Λ 0.8) = 0.54 P(Λ 1.6) = 0.87 Values of Λ greater than 3 are unlikely enough to ignore Unit 2 (v4)
48 Bayesian Inference for Error Rate Unit 2 (v4)
49 Features of the Posterior Distribution Central tendency Posterior mean of L is 0.87 Prior mean of L is 0.97; data mean is.83 Typically posterior central tendency is a compromise between the prior distribution and the center of the data Variation Posterior standard deviation of L is about.33 Prior standard deviation of L is about.62 Typically variation in the posterior is less than variation in the prior (we have more information) Meeting the threshold Posterior probability of meeting or doing better than design goal is about.58 Posterior probability that new system is better than old system is about.96 Posterior probability that new system is worse than old system is less than.02 Unit 2 (v4)
50 Triplot Visual tool for examining Bayesian belief dynamics Plot prior distribution, normalized likelihood, and posterior distribution Normalized likelihood: Divide likelihood by sum or integral over l Posterior distribution we would obtain if all values were assigned equal prior probability Unit 2 (v4)
51 Bayesian Belief Dynamics: Sequential and Batch Processing of Observations Batch processing: Use Bayes rule with prior g(θ) and combined likelihood f(x 1,, X n θ) to find posterior g(θ X 1,, X n ) Sequential processing: Use Bayes rule with prior g(θ) and likelihood f(x 1 θ) to find posterior g(θ X 1 ) Use Bayes rule with prior g(θ X 1 ) and likelihood f(x 2 θ) to find posterior g(θ X 1, X 2 ) Use Bayes rule with prior g(θ X 1,, X n-1 ) and likelihood f(x n θ) to find posterior g(θ X 1,, X n ) The posterior distribution after n observations is the same with both methods Unit 2 (v4)
52 Visualizing Bayesian Belief Dynamics: Hour-by-Hour Triplots for Transmission Error Data Unit 2 (v4)
53 Fundamental Identity of Bayesian Inference Likelihood function f(x θ)g(θ) = f(x)g(θ x) Prior density function Marginal likelihood The joint gpdf of parameter and data can be expressed in two different ways: Prior density for parameter times likelihood function Marginal likelihood times posterior density for parameter Posterior density function» Marginal likelihood is (conditional) likelihood integrated over parameter f x = θ g θ dμ(θ)= E θ g θ dθ F f(x θ g θ continuous parameter discrete Unit 2 (v4)
54 Marginal Likelihood Before we have seen X, we use the marginal likelihood to predict the value of X When used for predicting X, the marginal likelihood is called the predictive distribution for X After we see X=x, we divide the joint probability f(x θ)g(θ) by the marginal likelihood f(x) to obtain the posterior probability of θ: g(θ x) = f (x θ)g(θ) f (x) The marginal likelihood f(x) is the normalizing constant in Bayes Rule we divide by f(x) to ensure that the posterior probabilities sum to 1 The marginal likelihood f(x) = Σ θ f(x θ)g(θ) includes uncertainty about both θ and x given θ Non-Bayesians sometimes predict future observations using a point estimate of θ Predictions using the marginal likelihood are more spread out (include more uncertainty) than predictions using a point estimate of θ Unit 2 (v4)
55 Predicting Future Observations Compare Marginal Likelihood with Poisson(E[Λ observations so far]) Unit 2 (v4)
56 Parameters and Conditioning: Frequentists and Subjectivists Frequentist view of parameters Parameter Θ represents a true but unknown property of a random data generating process It is not appropriate to put a probability distribution on Θ because the value of Θ is fixed, not random Subjectivist view of parameters A subjectivist puts a probability distribution on Θ as well as X i given Θ Parametric distributions are a convenient means to specify probability distributions that represent our beliefs about as yet unobserved data Many subjectivists don t believe true parameters exist Frequentists and subjectivists on conditioning Frequentists condition on parameter and base inferences on data distribution f(x 1 Θ) f(x n Θ)» Even after X has been observed it is treated as random Subjectivists condition on knowns and treat unknowns probabilistically» Before observing data X=x, the joint distribution of parameters and data is f(x 1 Θ) f(x n Θ)g(Θ)» After observing data, X=x is known and Θ has distribution g(θ x 1,,x n ) Unit 2 (v4)
57 Connecting Subjective and Frequency Probability: de Finetti s Representation Theorem IF Your beliefs are represented by a probability distribution P over a sequence of events X 1, X 2, You believe the sequence is infinitely exchangeable (your probability for any sequence of successes and failures does not depend on the order of successes and failures) THEN You believe with probability 1 that the proportion of successes will tend to a definite limit as the number of trials goes to infinity S n where S n = # successes in n trials n p as n Your probability distribution for S n is the same as the distribution you would assess if you believed the observations were random draws from some unknown true value of p with probability density function f(p) P(S n = k) = 1 0 n k p k (1 p) n k g(p)dp P A sequence a frequentist would call random draws from a true distribution is one a Bayesian would call exchangeable X i i=1, Unit 2 (v4)
58 Bayesian Inference for Continuous Random Variables Inference for continuous random variable is limiting case of inference with discretized random variable as number of bins and width of bin goes to zero Accuracy tends to increase with more bins Accuracy tends to increase with smaller area per bin Be careful with tail area of unbounded random variables Closed form solution for continuous problem exists in special cases (see Unit 3) The prior distribution we used for the transmission error example was obtained by discretizing a Gamma distribution In Unit 3, we will compare with the exact result using the Gamma prior distribution Unit 2 (v4)
59 Summary and Synthesis A random variable represents an uncertain hypothesis Categorical, ordinal, discrete numerical, continuous numerical A probability distribution maps sets of possible values to numbers Probability of a set indicates how likely it is that the value belongs to the set Probability mass functions, density functions, and cumulative distribution functions are tools for calculating probabilites of sets We examined measures of central tendency and spread Parametric families of distributions are convenient and practical off the shelf models for common types of uncertain processes We listed several commonly used parametric families We noted that observations are often modeled as randomly sampled from a parametric family with unknown parameter We applied Bayes rule to infer a posterior distribution for a parameter (using a discretized prior distribution) We contrasted the subjectivist and frequentist approaches to parameter inference We saw that de Finetti s exchangeability theorem provides a connection between the frequentist and subjectivist views of parameters Unit 2 (v4)
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions
More informationProbability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014
Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions
More informationLearning Objectives for Stat 225
Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:
More informationMath Review Sheet, Fall 2008
1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the
More information1 Probability and Random Variables
1 Probability and Random Variables The models that you have seen thus far are deterministic models. For any time t, there is a unique solution X(t). On the other hand, stochastic models will result in
More information15 Discrete Distributions
Lecture Note 6 Special Distributions (Discrete and Continuous) MIT 4.30 Spring 006 Herman Bennett 5 Discrete Distributions We have already seen the binomial distribution and the uniform distribution. 5.
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationChapter 5. Chapter 5 sections
1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationHuman-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015
Probability Refresher Kai Arras, University of Freiburg Winter term 2014/2015 Probability Refresher Introduction to Probability Random variables Joint distribution Marginalization Conditional probability
More informationA Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.
A Probability Primer A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes. Are you holding all the cards?? Random Events A random event, E,
More information01 Probability Theory and Statistics Review
NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement
More informationRandom Variables and Their Distributions
Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationSTAT Chapter 5 Continuous Distributions
STAT 270 - Chapter 5 Continuous Distributions June 27, 2012 Shirin Golchi () STAT270 June 27, 2012 1 / 59 Continuous rv s Definition: X is a continuous rv if it takes values in an interval, i.e., range
More information2 Belief, probability and exchangeability
2 Belief, probability and exchangeability We first discuss what properties a reasonable belief function should have, and show that probabilities have these properties. Then, we review the basic machinery
More informationJoint Probability Distributions and Random Samples (Devore Chapter Five)
Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete
More informationClass 26: review for final exam 18.05, Spring 2014
Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationRandom variables. DS GA 1002 Probability and Statistics for Data Science.
Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Motivation Random variables model numerical quantities
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationReview of Probabilities and Basic Statistics
Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to
More informationInference for a Population Proportion
Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability
More informationPreliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com
1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics, Appendix
More informationData Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2017 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
More informationAlgorithms for Uncertainty Quantification
Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example
More informationLecture 1: Probability Fundamentals
Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationChapter 2. Continuous random variables
Chapter 2 Continuous random variables Outline Review of probability: events and probability Random variable Probability and Cumulative distribution function Review of discrete random variable Introduction
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationAarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell
Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework
More informationAP Statistics Cumulative AP Exam Study Guide
AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationA Few Special Distributions and Their Properties
A Few Special Distributions and Their Properties Econ 690 Purdue University Justin L. Tobias (Purdue) Distributional Catalog 1 / 20 Special Distributions and Their Associated Properties 1 Uniform Distribution
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationProbability and Distributions
Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationUQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables
UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables To be provided to students with STAT2201 or CIVIL-2530 (Probability and Statistics) Exam Main exam date: Tuesday, 20 June 1
More informationContinuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( ) Chapter 4 4.
UCLA STAT 11 A Applied Probability & Statistics for Engineers Instructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology Teaching Assistant: Christopher Barr University of California, Los Angeles,
More informationContinuous random variables
Continuous random variables Continuous r.v. s take an uncountably infinite number of possible values. Examples: Heights of people Weights of apples Diameters of bolts Life lengths of light-bulbs We cannot
More informationSummary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016
8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying
More informationBayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017
Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries
More informationIAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES
IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES VARIABLE Studying the behavior of random variables, and more importantly functions of random variables is essential for both the
More informationLecture 1: Bayesian Framework Basics
Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of
More informationChapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations
Chapter 5 Statistical Models in Simulations 5.1 Contents Basic Probability Theory Concepts Discrete Distributions Continuous Distributions Poisson Process Empirical Distributions Useful Statistical Models
More informationSubject CS1 Actuarial Statistics 1 Core Principles
Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and
More informationChapter 5 continued. Chapter 5 sections
Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationTime Series and Dynamic Models
Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The
More informationModeling Uncertainty in the Earth Sciences Jef Caers Stanford University
Probability theory and statistical analysis: a review Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University Concepts assumed known Histograms, mean, median, spread, quantiles Probability,
More informationEstimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio
Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist
More informationTopic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.
Topic 2: Probability & Distributions ECO220Y5Y: Quantitative Methods in Economics Dr. Nick Zammit University of Toronto Department of Economics Room KN3272 n.zammit utoronto.ca November 21, 2017 Dr. Nick
More informationBayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification
Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification Michael Anderson, PhD Hélène Carabin, DVM, PhD Department of Biostatistics and Epidemiology The University of Oklahoma
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationSingle Maths B: Introduction to Probability
Single Maths B: Introduction to Probability Overview Lecturer Email Office Homework Webpage Dr Jonathan Cumming j.a.cumming@durham.ac.uk CM233 None! http://maths.dur.ac.uk/stats/people/jac/singleb/ 1 Introduction
More informationGlossary for the Triola Statistics Series
Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling
More informationWhy study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables
ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section
More informationNorthwestern University Department of Electrical Engineering and Computer Science
Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationStatistics for scientists and engineers
Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3
More informationThis does not cover everything on the final. Look at the posted practice problems for other topics.
Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry
More informationRandom Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R
In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationStatistics and Data Analysis
Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli Statistics and Data
More informationExamples of random experiment (a) Random experiment BASIC EXPERIMENT
Random experiment A random experiment is a process leading to an uncertain outcome, before the experiment is run We usually assume that the experiment can be repeated indefinitely under essentially the
More informationBrandon C. Kelly (Harvard Smithsonian Center for Astrophysics)
Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming
More informationBrief Review of Probability
Maura Department of Economics and Finance Università Tor Vergata Outline 1 Distribution Functions Quantiles and Modes of a Distribution 2 Example 3 Example 4 Distributions Outline Distribution Functions
More informationProbability. Lecture Notes. Adolfo J. Rumbos
Probability Lecture Notes Adolfo J. Rumbos October 20, 204 2 Contents Introduction 5. An example from statistical inference................ 5 2 Probability Spaces 9 2. Sample Spaces and σ fields.....................
More informationIntroduction to Bayesian Inference
Introduction to Bayesian Inference p. 1/2 Introduction to Bayesian Inference September 15th, 2010 Reading: Hoff Chapter 1-2 Introduction to Bayesian Inference p. 2/2 Probability: Measurement of Uncertainty
More informationSystem Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models
System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models Fatih Cavdur fatihcavdur@uludag.edu.tr March 20, 2012 Introduction Introduction The world of the model-builder
More information1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.
probabilities, we ll use Bayes formula. We can easily compute the reverse probabilities A short introduction to Bayesian statistics, part I Math 17 Probability and Statistics Prof. D. Joyce, Fall 014 I
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationProbability Theory and Statistics. Peter Jochumzen
Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................
More informationPARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.
PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.. Beta Distribution We ll start by learning about the Beta distribution, since we end up using
More informationProbability theory basics
Probability theory basics Michael Franke Basics of probability theory: axiomatic definition, interpretation, joint distributions, marginalization, conditional probability & Bayes rule. Random variables:
More information[POLS 8500] Review of Linear Algebra, Probability and Information Theory
[POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming
More information3 Continuous Random Variables
Jinguo Lian Math437 Notes January 15, 016 3 Continuous Random Variables Remember that discrete random variables can take only a countable number of possible values. On the other hand, a continuous random
More informationProbability and Probability Distributions. Dr. Mohammed Alahmed
Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about
More informationSlides 8: Statistical Models in Simulation
Slides 8: Statistical Models in Simulation Purpose and Overview The world the model-builder sees is probabilistic rather than deterministic: Some statistical model might well describe the variations. An
More informationProbability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008
Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008 1 Review We saw some basic metrics that helped us characterize
More informationBayesian Inference and Decision Theory
Bayesian Inference and Decision Theory Instructor: Kathryn Blackmond Laskey Room 224 ENGR (703) 993-644 Office Hours: Thursday 4:00-6:00 PM, or by appointment Spring 208 Unit 3: Bayesian Inference with
More informationLecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019
Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial
More information(3) Review of Probability. ST440/540: Applied Bayesian Statistics
Review of probability The crux of Bayesian statistics is to compute the posterior distribution, i.e., the uncertainty distribution of the parameters (θ) after observing the data (Y) This is the conditional
More informationSample Spaces, Random Variables
Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted
More informationChapter 5. Bayesian Statistics
Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective
More informationUSEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*
USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* 3 Conditionals and marginals For Bayesian analysis it is very useful to understand how to write joint, marginal, and conditional distributions for the multivariate
More informationAdvanced topics from statistics
Advanced topics from statistics Anders Ringgaard Kristensen Advanced Herd Management Slide 1 Outline Covariance and correlation Random vectors and multivariate distributions The multinomial distribution
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationContinuous Distributions
Chapter 3 Continuous Distributions 3.1 Continuous-Type Data In Chapter 2, we discuss random variables whose space S contains a countable number of outcomes (i.e. of discrete type). In Chapter 3, we study
More informationStat 5101 Notes: Brand Name Distributions
Stat 5101 Notes: Brand Name Distributions Charles J. Geyer September 5, 2012 Contents 1 Discrete Uniform Distribution 2 2 General Discrete Uniform Distribution 2 3 Uniform Distribution 3 4 General Uniform
More informationProbability Theory. Patrick Lam
Probability Theory Patrick Lam Outline Probability Random Variables Simulation Important Distributions Discrete Distributions Continuous Distributions Most Basic Definition of Probability: number of successes
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationProbability. Table of contents
Probability Table of contents 1. Important definitions 2. Distributions 3. Discrete distributions 4. Continuous distributions 5. The Normal distribution 6. Multivariate random variables 7. Other continuous
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationSummarizing Measured Data
Performance Evaluation: Summarizing Measured Data Hongwei Zhang http://www.cs.wayne.edu/~hzhang The object of statistics is to discover methods of condensing information concerning large groups of allied
More informationChapter 1. Sets and probability. 1.3 Probability space
Random processes - Chapter 1. Sets and probability 1 Random processes Chapter 1. Sets and probability 1.3 Probability space 1.3 Probability space Random processes - Chapter 1. Sets and probability 2 Probability
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More information1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.
1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions
More informationA Very Brief Summary of Bayesian Inference, and Examples
A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X
More information