Bayesian Inference and Decision Theory

Size: px
Start display at page:

Download "Bayesian Inference and Decision Theory"

Transcription

1 Bayesian Inference and Decision Theory Instructor: Kathryn Blackmond Laskey Room 2214 ENGR (703) Office Hours: Thursday 4:00-6:00 PM, or by appointment Spring 2018 Unit 2: Random Variables, Parametric Models, and Inference from Observation Unit 2 (v4) - 1 -

2 Learning Objectives for Unit 2 Define and apply the following: Event Random variable (univariate and multivariate) Joint, conditional and marginal distributions Independence; conditional independence Mass and density functions; cumulative distribution functions Measures of central tendency and spread Be familiar with common parametric statistical models Continuous distributions:» Normal, Gamma, Exponential, Uniform, Beta, Dirichlet Discrete distributions:» Bernoulli, Binomial, Multinomial, Poisson, Negative Binomial Use Bayes rule to find the posterior distribution of a parameter given a set of observations State de Finetti's Theorem and its relationship to the concept of "true" probabilities Use R to do Bayesian inference with discrete random variables Unit 2 (v4) - 2 -

3 Events, Partitions and Probability Distributions Some definitions relating to mathematical models of situations involving uncertainty: Sample space - a set S consisting of all possible elementary outcomes Event - a subset E Ì S Partition - a finite or infinite collection {H 1, H 2, } events such that H i Ç H j = Æ if i j and È i H i = S The sample space S completely characterizes the outcome to the finest level of detail needed for the model We say an event E Ì S occurs if the actual outcome s is an element of E If {H 1, H 2, } is a partition of S, then exactly one of the events H i occurs A probability distribution is a function from events to real numbers satisfying the axioms of probability If E is an event and {H 1, H 2, } is a partition, then the axioms imply: Law of total probability: Bayes Rule: k P(E) = P(E H k )P(H k ) P(H k E) = P(E H k)p(h k ) P(E) = j P(E H k )P(H k ) Note: This is the standard definition. Hoff s definition (p. 15 of Hoff) is non-standard. P(E H j )P(H j ) Unit 2 (v4) - 3 -

4 Random Variables Random variable (RV) - mathematical representation for uncertain outcome RV has set of possible values (also called states or outcomes) Categorical RVs - possible values are category labels (e.g., rainy/cloudy/sunny) Ordinal RVs - possible values are qualitatively ordered (e.g., high/medium/low) Numerical RVs - possible values are numbers» Discrete - values drawn from finite or countable set (e.g., 0, 1, 2, 3, )» Continuous - values drawn from continuous set (e.g., real numbers) Probability distribution quantifies likelihood For a Bayesian agent the probability distribution: Summarizes all information needed for inference or decision Is always conditional on the agent s current knowledge Changes with new information according to Bayes Rule Unit 2 (v4) - 4 -

5 Random Variable: Formal Definition A probability space (S, A, P) consists of : A set S of elementary outcomes, called the sample space A s-algebra A over S» Collection of events (subsets of S) that contains Æ and S» Closed under countable intersection, union and complementation» For discrete RV we typically use s-algebra of all subsets; for continuous RV we use s-algebra generated by intervals A probability function P mapping events in A to numbers between 0 and 1, such that probability axioms are satisfied A random variable X is a function from S to set X of possible values A random variable maps element of sample space to outcome X(s) = x» The probability distribution on S induces a probability distribution on X» P(Y) is the probability of the subset A of S that maps onto the subset Y of X» We usually write X and not X(s)» Often the sample space is left implicit (this can cause confusion) S A x=x(s) Y X Notation convention: Some statisticians use uppercase letters to denote random variables and lowercase letters to denote values of sample space; others use only lowercase letters Unit 2 (v4) - 5 -

6 RV from Frequentist View: Example Question: does randomly chosen patient have disease? Sample space S = {s 1, s 2, } represents population of patients arriving at medical clinic 30% of the patients in this population have the disease 95% of the patients who have the disease test positive 85% of the patients who do not have the disease test negative Random variables: X maps randomly drawn patient s to X(S) = 0 if patient does not have disease and X(S) = 1 if patient has disease Y maps randomly drawn patient S to Y(S) = 0 if patient s test result is negative and Y(S) = 1 if patient s test result is positive Usually we suppress the argument and write (X, Y) for the two-component random variable representing the condition and test result of a randomly drawn patient Pr(X=1) = 0.3 Pr(Y=1 X=1) = 0.95 Pr(Y=0 X=0) = 0.85 Pr(X=1 Y=1) = 0.73 (by Bayes rule) These probabilities refer to a sampling process and not to any particular patient Unit 2 (v4) - 6 -

7 RV from Subjectivist View: Example Question: does a specific patient (e.g., George Smith) have disease? Sample space S = {s 1, s 2, } represents possible worlds (or ways the world could be) In 30% of these possible worlds, George has the disease In 95% of these possible worlds in which George has the disease, the test is positive In 85% of the possible worlds in which George does not have the disease, the test is negative Random variables: X maps possible world s to X(s) = 0 if George does not have disease and X(s) = 1 if George has disease in this world Y maps possible world s to Y(s) = 0 if George s test result is negative and Y(s) = 1 if George s test result is positive in this world Usually we suppress the argument and write (X, Y) for the two-component random variable representing George s condition and test result Pr(X=1) = 0.3 Pr(Y=1 X=1) = 0.95 Pr(Y=0 X=0) = 0.85 Pr(X=1 Y=1) = 0.73 (by Bayes rule) These probabilities refer to our beliefs about George s specific case Unit 2 (v4) - 7 -

8 Comparison Frequentists say a probability distribution arises from some random process on the sample space (such as random selection) Pr(X=1) = 0.3 means: If patients are chosen at random from the population of clinic patients, 30% of them will have the disease Pr(X=1 Y=1) = 0.73 means If patients are chosen at random from the population of clinic patients who test positive, 73% of them will have the disease The probability George Smith has the disease is either 1 or 0, depending on whether he does or does not have the disease Subjectivists say a probability distribution represents someone s beliefs about the world Pr(X=1) = 0.3 means: If we have no information except that George walked into the clinic, then our degree of belief is 30% that he has the disease Pr(X=1 Y=1) = 0.73 means If we learn that George tested positive, our belief increases to 73% that he has the disease The probability that George has the disease depends on our information about George. If we know only that he went to the clinic, our probability is 30%. If we also learn that he tested positive, our probability becomes 73%. If we receive a definitive diagnosis, then the probability he has the disease is either 1 or 0, depending on whether he does or does not have the disease. Unit 2 (v4) - 8 -

9 Frequentists and Subjectivists: Random Sampling Suppose we sample a patient at random from a population in which: 30% of the patients in this population have the disease 95% of the patients who have the disease test positive 85% of the patients who do not have the disease test negative Subjectivist and frequentist will agree that: Pr(X=1) = 0.3 Pr(Y=1 X=1) = 0.95 See example from Unit 1 p. 14 Pr(X=1 Y=1) = A frequentist will say: These probabilities apply to the population and the sampling process There is no such thing as the probability that a particular patient has the disease A subjectivist will say: These probabilities apply to any particular patient for whom the information given is our only relevant information about the patient If we learn something new (e.g., George s father had the disease) then we need to update our probabilities to reflect the new knowledge Frequentists apply Bayesian methods when there is a frequency interpretation of the probabilities In practice, Bayesians and frequentists are often in close agreement on the practical implications of statistical analysis Unit 2 (v4) - 9 -

10 Mass, Density and Cumulative Distribution Functions Probability mass function (pmf) - Maps each possible value of a discrete random variable to its probability Probability density function (pdf) - Maps each possible value of a continuous random variable to a positive number representing its likelihood relative to other possible values Cumulative distribution function (cdf) - Value of cdf at x is probability that the random variable is less than or equal to x Unit 2 (v4)

11 Discrete Random Variables: pmf and cdf Probability mass function f(x) is probability that random variable X is equal to x f (x) = 1 x Cumulative Distribution Function Cumulative distribution function F(x) is probability that random variable X is less than or equal to x Increases from zero (at - ) to 1 (at + ) Step function has jump at each value Height of step at possible value x is equal to value of pmf at x F(x) = x' x f (x ') F(x) x Unit 2 (v4)

12 Continuous Random Variables: pdf and cdf Probability Density f(x) versus x Probability density function Each individual value has probability zero pdf is derivative of cdf: f(x) = df(x)/dx f(x)dx is approximately equal to the probability that X lies in [x-dx/2, x+dx/2] f (x)dx = 1 x Area under f(x) between a and b is equal to difference F(b)-F(a) Cumulative distribution function cdf is integral of pdf: F(x) = f (x ')dx ' Probability that X lies in interval [a,b] is difference of cdf values; also integral of pdf: P(a < X b) = F(b) F(a) = b a x' x f (x)dx Probability Density F(x) versus x { Unit 2 (v4)

13 Examples: Shape of pdf f(x) versus x F(x) versus x Symmetric, unimodal distribution f(x) versus x F(x) versus x Skewed, unimodal distribution f(x) versus x F(x) versus x Bimodal distribution Unit 2 (v4)

14 Generalized Probability Density Function (gpdf) It is useful to have common notation for density and mass functions Integral symbol is used to mean integration when distribution is continuous and summation when distribution is discrete Remember that an integral is a limit of sums gpdf f(x) is either mass or density function: ò x ò x g(x)f(x)d µ (x) º åg(x)f(x) g(x)f(x)d µ (x) º x ò x g(x)f(x) dx if X is discrete if X is continuous µ(x) stands for measure (counting measure for discrete random variables and uniform measure for continuous random variables) Unit 2 (v4)

15 Central Tendency Measure of central tendency: Summary that indicates the center of the distribution Always lies between largest and smallest value Mean (expected value E[X]): E[X] = xf (x)dµ(x) Weight each observation by its probability of occurring In skewed distributions the mean is pulled toward the skewed tail Sample mean of a random sample converges to the distribution mean as the sample size becomes large Median (50th percentile x 0.5 ): P(X x 0.5 ) = 0.5 Half of the observations are larger, half smaller than the median Less influenced by skewness and outliers than the mean Sample median of a random sample of observations converges to population median as sample size becomes large Generalization to multivariate distributions? Mode (x max ): f (x max ) f (x) for all x Most probable value Distributions may have multiple modes (multiple peaks in pmf or pdf) In multimodal distributions the mean and median don t say much about the center of the distribution; a single numerical summary may not be appropriate x Unit 2 (v4)

16 Spread Measure of spread: Summary that indicates how spread out a distribution is, or how far from the center a typical observation is likely to fall Variance and standard deviation Variance is the expected value of the squared V[X] = difference between an observation and the mean Standard deviation is the square root of the variance Standard deviation is in same units as the variable itself Median absolute deviation (MAD) Median of the absolute value of the difference between an observation and the median Less influenced by outliers than the standard deviation Credible interval (Bayesian confidence interval) Level-a credible interval [r,s]:» probability that the RV lies between r and s is a Kinds of level-a credible intervals:» one-sided credible intervals ( x E[X] ) 2 f (x)dµ(x)» two-sided credible intervals with equal probability outside interval on either side» two-sided highest-posterior density intervals Which to use depends on the purpose x SD[X] = V[X] P( X x 0.5 x MAD ) = 0.5 P( r X s) = α Unit 2 (v4)

17 Different Varieties of Credible Interval Highest posterior density (HPD) Two-sided symmetric tail areas One-sided Unit 2 (v4)

18 Reporting Measures of Spread No presentation of statistical results is complete without a measure of spread! Policy makers sometimes resist presentations that include variability Need to educate decision makers on why spread is important Need to present variability in ways that make policy implications clear» Forecast is $32 per barrel with a 90% credible interval of [25, 48] may be hard to grasp» Decision makers can more easily grasp: Best guess forecast is $32 per barrel. The price is very unlikely to drop below $25 per barrel, but there are many low-probability, high-impact scenarios which could result in a significant increase in price above that level, to as much as $48 per barrel, especially if some of the scenarios are described Graphical representations of spread are important Taken from Hannah Laskey s science fair entry, February 2011 Unit 2 (v4)

19 Theoretical and Sample Summary Statistics Expected value or mean of a distribution: Variance of a distribution: Sample mean for sample X 1,,X n : Sample variance for sample X 1,,X n : In some definitions, denominator is n instead of n-1 Dividing by n-1 gives unbiased estimate of distribution mean If sample is iid from a normal distribution then dividing by n gives maximum likelihood estimate (MLE) Quantile function: Var[X] = How does this function behave for a discrete distribution?» Between jumps of F(x)?» At a jump of F(x)? There are several different definitions of quantiles for discrete distributions and samples; this is one of the most commonly used Median is 0.5 quantile (or 50th percentile) x x X = 1 n F 1 (q) = min{ F(x) q} E[X] = x xf (x)dµ(x) ( x E[X] ) 2 f (x)dµ(x) n i=1 X i S 2 = 1 n 1 n i=1 ( X i X) 2 Unit 2 (v4)

20 Data Summary in R Data set on 44 babies born in Brisbane, Australia on 21 December 1997 Description: Dataset provided as part of UsingR add-on package in R Screenshot of Console from R Session R reads the data into a data frame called babyboom - R object for representing a data set - List of equal-length vectors each with a name - babyboom$wt is the 3 rd column containing weights of babies See summary is one of the many R methods for data frames Unit 2 (v4)

21 Graphical Summaries in R There are many ways to customize R graphics Box extends from lower to upper quartile, with median shown as line across box. Upper whisker extends to largest observation (upper quartile x interquartile range) Lower whisker extends to smallest observation (lower quartile x interquartile range) Unit 2 (v4)

22 Multivariate Distributions A random vector X is a vector of random variables Underscore denotes a vector Underscore may be omitted many authors do not have special notation for multivariate distributions Joint distribution function cdf F(x)=P(X x) gpdf f(x) (density if continuous, mass function if discrete) Expected value vector Covariance matrix E[X] = Conditional independence assumptions are often used to simplify specification and inference for multivariate distributions R n Cov[X] = xf (x)dµ(x) ( X E[X] )( X E[X] ) T f (x)dµ(x) R n Unit 2 (v4)

23 Covariance and Correlation The covariance matrix measures how random quantities vary together The (i,j) th off-diagonal element is called Cov(X i,x j ) or covariance of X i and X j, Cov(X i,x j ) is equal to the product of the standard deviations of X i and X j if they are perfectly linearly related Cov(X i,x j ) = 0 if X i and X j are independent The correlation matrix is The diagonal elements are equal to 1 The (i,j) th off-diagonal element Cov(X i,x j )/(SD(X i ) SD(X j )) is the covariance divided by the product of the standard deviations The off-diagonal elements are real numbers between -1 and 1» Zero correlation means RVs are independent» The closer the correlation is to 1 or -1, the more nearly linearly related the RVs are Unit 2 (v4)

24 Example: Joint Density Functions f(x,y)=f(x)f(y) f(x,y)=f(x)f(y x) Unit 2 (v4)

25 Example: Scatterplot of Random Sample from Joint Distribution Unit 2 (v4)

26 Marginal and Conditional Distributions If (X,Y) has joint gpdf f(x,y): The marginal gpdf of X is f (x) = f (x, y)dµ(y) (y is marginalized out) y f (x, y) The conditional gpdf of X given Y is f (x y) = f (y) The joint gpdf of X and Y can be factored into conditional and marginal gpdfs: f (x, y) = f (x y) f (y) = f (y x) f (x) The joint gpdf for n random variables (X 1,, X n ) can be written as: f (x 1, x n ) = f (x 1 ) f (x 2 x 1 ) f (x 3 x 1, x 2 )! f (x n x 1, x n 1 ) n = f (x i x 1, x i 1 ) i=1 Unit 2 (v4)

27 Distributions in R Each R distribution has four functions, invoked by adding a prefix to the distribution s base name: p for probability the cumulative distribution function (cdf) q for quantile the inverse cdf d for density the density or mass function r for random generates random numbers from the distribution Examples: rpois, dgamma Information on distributions in R: Source: Functions for the Beta-Binomial distribution are available in the VGAM package Unit 2 (v4)

28 Independence and Conditional Independence X and Y are independent if the conditional distribution of X given Y does not depend on Y When X and Y are independent, the joint density (or mass) function is equal to the product of the marginal density (or mass) functions f(x,y) = f(x)f(y) X and Y are conditionally independent given Z if: f(x,y,z) = f(x z)f(y z)f(z) Graphs provide a powerful tool for visualizing and formally modeling dependencies between random variables Random variables (RVs) are represented by nodes Direct dependencies are represented by edges Absence of an edge means no direct dependence X Y P(X,Y ) = P(X)P(Y ) General form is f(x,y,z) = f(x y,z)f(y z)f(z) X Y W P(X,Y,Z,W ) = P(X)P(Y )P(W X,Y )P(Z W ) Z Unit 2 (v4)

29 Independent and Identically Distributed Observations Statistical models often assume the observations are a random sample from a parameterized distribution Mathematically, this is represented as independent and identically distributed (iid) conditional on the parameter q The density function for an iid sample X 1,,X n is written as a product of factors: f (x 1, x 2,, x n θ) = f (x 1 θ) f (x 2 θ)! f (x n θ) = n i=1 f (x i θ) X 1 Θ X 2 X 3 Θ X i i=1,,3 plate A plate represents repeated structure The RVs inside the plate are iid conditional on their parents In this example, the X i are iid given Θ Joint gpdf for (X,Θ): g(q)p i f(x i q) Conditional gpdf for x givenq: P i f(x i q) Unit 2 (v4)

30 Graphical Probability Models Graphical probability models provide a means of specifying complex multivariate distributions in compact form, visualizing dependence relationships, and computing results of inference In the past, statistical packages have been limited to models with simple random sampling Graphical models can represent more complex kinds of dependencies Software is becoming available to estimate parameters of these more complex models from data (such as JAGS / R) Birth rates Λ k for hospitals k=1,, n are sampled randomly from A A ~ h(α) a distribution with parameter A A Λ k Λ 1 Λ 2 Hourly births X ki at hospital k are sampled from a Poisson distribution with parameter Λ k Λ n Λ k ~ g(λ α) X ki i=1,,m k k=1,,n X X X 21 X 21 X 11 X 21 X X n1 X n1 n1 X ki ~ Poisson(λ k ) Unit 2 (v4)

31 Parametric Families of Distributions Much of statistics is based on constructing models of phenomena using parametric families of distributions Simple models typically assume observations X 1,, X N are sampled randomly from a distribution with gpdf f(x Q)» The symbol Q denotes a parameter» Functional form f(x Q) is known but value of parameter is unknown Parameter may be a number Q or a vector Q = (Q 1,...,Q p ) Canonical problems: Use the sample to construct an estimate (point or interval) for the parameter Test a hypothesis about the value of the parameter, e.g.:» Is Q equal greater than a particular value q 0?» Are two parameters Q 1 and Q 2 equal to each other? Is the functional form f(x Q) adequate as a model for the phenomenon generating the observations? Predict features (e.g., mean) of a future sample X 1+N,, X M+N Hierarchical (or multi-level) Bayesian models Retain simplicity of parametric models with standard functional form Build complex models out of simple building blocks Many distributions have multiple parameterizations in common use and it is important not to confuse different parameterizations Unit 2 (v4)

32 Some Discrete Parametric Families (p. 1) Binomial distribution with number of trials n and success probability π: Sample space: Non-negative integers n pmf: f (x n,π ) = π x (1 π ) n x x E[X n,π] = nπ; Var[X n, π] = nπ (1-π) Multinomial distribution with number of trials n and success probability π: Sample space: Vectors of non-negative integers pmf: n! f (x 1, x p n,π 1,,π p ) = x 1!!x p! E[X i n, π] = nπ i ; Var[X i n, π] = nπ i (1-π i ) Poisson distribution with rate λ: Sample space: Non-negative integers pmf: f (x λ) = e λ λ x x! E[X λ] = λ ; Var[X λ]= λ p i=1 π i x i Bernoulli distribution is a Binomial distribution with n=1 Unit 2 (v4)

33 Some Discrete Parametric Families (p. 2) Negative binomial distribution with size α and probability π: Sample space: Non-negative integers α + x 1 pmf: f (x α,π ) = α π α (1 π ) x E[X α,π ] = πα 1 π Var[X α,π ] = πα (1 π ) 2 Unit 2 (v4)

34 Some Continuous Parametric Families (p. 1) Gamma distribution with shape α and scale β: Sample space: Nonnegative real numbers 1 pdf: f (x α,β) = β α Γ(α) xα 1 e x/β E[X α,β] = αβ ; Var[X α,β]= αβ 2 Beta distribution with shape parameters α and β: Sample space: Real numbers between 0 and 1 Gamma distribution is sometimes parameterized with scale and rate r = 1/β Γ(α + β) pdf: f (x α,β) = Γ(α)Γ(β) xα-1 (1 x) β-1 αβ E[X α,β] = α/(α+β) ; Var[X α,β]= (α + β) 2 (α + β +1) Univariate normal distribution with mean µ and standard deviation σ: Sample space: Real numbers 1 pdf: f (x µ,σ) = 2πσ exp + 2 % x 1 i - µ (., 2' * / - & σ ) 0 E[X µ,σ] = µ ; Var[X µ,σ]= σ 2 Multivariate normal distribution with mean µ covariance matrix Σ: Sample space: Vectors of real numbers 1 pdf: f (x µ, Σ) = 2π Σ exp 1 2 ( x µ )'Σ 1 x µ E[X µ,σ] = µ ; Var[X µ,σ]= Σ Exponential distribution is a Gamma distribution with α=1 Chi-square distribution with δ degrees of freedom is a Gamma distribution with α=δ/2 and β=2 Uniform distribution is a Beta distribution with α=β=1 Gamma function { ( )} Γ(y) 0 u y 1 e u du Unit 2 (v4)

35 Some Continuous Parametric Families (p. 2) Dirichlet distribution with shape parameters α 1,α 2,,α p : Sample space: Real non-negative vectors summing to 1 pdf: f (x 1,, x p α 1,...,α p ) = Γ(α +!+α ) 1 p Γ(α 1 )!Γ(α p ) x 1 E[X i α 1,...,α p ] = α i Var[X α i α 1,...,α p ] = j j α 1-1!x p α p -1 α i (1 α i ) α j j +1 j ( ) 2 ( α ) j Dirichlet distribution is a multivariate generalization of the Beta distribution Wishart distribution with degrees of freedom α and scale matrix V, where V is a positive definite matrix of dimension p and α > p-1 Sample space: Positive definite p-dimensional matrices pdf: f X α, V = X (12324)/6 e 289(V:; X)/ V 1 6Γ 3 ( α 2 ) Γ 3 is the multivariate Gamma function tr is the trace function E X α, V = αv Wishart distribution is a multivariate generalization of Gamma and Chi-Square distributions Unit 2 (v4)

36 Sufficient Statistic A sufficient statistic is a data summary (function of sample of observations) such that the observations are independent of the parameter given the sufficient statistic Example: For a sample of n iid observations from a Poisson distribution, the sum of the observations is a sufficient statistic for the rate parameter Example: For a sample of n observations from a Bernoulli distribution, the total number of successes is a sufficient statistic for the success probability If observations X 1,, X N are sampled randomly from a distribution with gpdf f(x θ) and Y is sufficient for the parameter θ, then the posterior distribution f(θ X) depends on the observations only through the sufficient statistic Fisher s factorization theorem: T(X) is sufficient for θ if and only if the conditional probability distribution f(x θ) can be factored as: f(x θ) = h(x)g θ (T(x)) Unit 2 (v4)

37 Is a Parametric Distribution a Good Model? Before applying a parametric model, we should perform exploratory analysis to evaluate whether the model is an adequate representation of the data A q-q plot is a commonly used diagnostic tool Plot quantiles of data distribution against quantiles of theoretical distribution If theoretical distribution is correct, the plot should look approximately like the plot of y=x Problem: what about unknown parameters? Solution: We can estimate parameters from data Solution: in the case of a location-scale family, we can plot data quantiles against standard distribution (location parameter = 0 and scale parameter = 1)» If theoretical distribution is correct the plot should look approximately like a straight line» Slope and intercept of line are determined by scale and location parameters Another diagnostic tool is to compare empirical and theoretical counts for discrete RV (or discretized continuous RV) What other checks for model fit have you encountered? In your homework I expect you to assess whether the parametric model I give you is a good one (sometimes it won t be!) Unit 2 (v4)

38 Are Birth Weights Normally Distributed? Data on birth weights of babies born in a Brisbane hospital on December 18, 1997 Source: Unit 2 (v4)

39 Birth Weights of Boys and Girls Unit 2 (v4)

40 Are Times Between Births Exponentially Distributed? The data: times between births of babies born in a Brisbane hospital on December 18, 1997 R notes: ppoints function computes probabilities for evaluating quantiles diff function computes lagged differences lines function adds lines to a plot qexp function computes exponential quantiles Unit 2 (v4)

41 Looking for Temporal Patterns (1 of 2) Time Series Plot of Birth Intervals Time Until Next Birth (minutes) If birth times are iid, then we should not see any patterns (such as more births at some times of day) Do you see any patterns in this plot? Time of Birth (minutes from midnight) plot(birthtim[1:43],birthinterval,main="time Series Plot of Birth Intervals", ylab="time Until Next Birth (minutes)",xlab="time of Birth (minutes from midnight)") Unit 2 (v4)

42 Looking for Temporal Patterns (2 of 2) Lag Plot of Birth Intervals birthinterval[2:43] The lag plot looks for a dependency between successive birth intervals (such as long intervals being followed by long intervals) Do you see any pattern in this plot? birthinterval[1:42] plot(birthinterval[1:42],birthinterval[2:43],main="lag Plot of Birth Intervals") Unit 2 (v4)

43 Do Births Per Hour Follow a Poisson Distribution? Comparing empirical to theoretical counts is a useful tool for evaluating fit (if the expected counts are not too small) Births Empirical Count Expected Count TOTAL Unit 2 (v4)

44 Frequency Plot is Misleading when Expected Counts are Too Small Blue bars: empirical counts from sample of size 35 from Poisson distribution with mean 20 (in bins of 1 and 3 time units) Pink bars: expected counts for sample of size 35 from Poisson distribution with mean 20 (in bins of 1 and 3 time units) When expected counts are small, plots of observed and expected counts will not look similar Common rule of thumb: choose bin size so most bins have expected count of at least 5 Unit 2 (v4)

45 Inferring the Value of a Parameter Often we model a set of observations as independent trials from a probability distribution with unknown parameter f(x 1,,x n q) = f(x 1 q) f(x n q) The joint gpdf of x given q, viewed as a function of q, is called the likelihood function Likelihood principle: The likelihood function captures everything about the observations needed to draw inferences To draw inferences about q, a Bayesian statistician must also specify a prior distribution g(q) We condition on the sample to obtain the posterior distribution using Bayes Rule: g(θ x 1, x n ) = n i=1 n i=1 f (x i θ) g(θ) f (x i θ) g(θ)dµ(θ) f(x) is called the marginal likelihood of the observations x=(x 1,,x n ) Note: the X i are independent given q, but are not independent after integrating over q = f (x θ)g(θ) f (x) Θ X i i=1,,n Unit 2 (v4)

46 Example: Transmission Errors Number of transmission errors per hour is distributed as Poisson with parameter λ f (x λ) = e λ λ x x! Data on previous system established error rate as 1.6 errors per hour New system has design goal of cutting this rate in half to 0.8 errors per hour Observe new system for 6 one-hour periods: Data: 1, 0, 1, 2, 1, 0 Questions: Have we met the design goal? Does the new system improve the error rate? Unit 2 (v4)

47 The Prior Distribution (discretized) Error rate can be any positive real number We use expert judgment to define prior distribution on a discrete set of values (later we will revisit this problem with a continuous prior distribution) Experts familiar with the new system design said: Meeting the design goal of 0.8 errors per hour is about a proposition. The chance of making things worse than current rate of 1.6 errors per hour is small but not negligible Expert agrees that the discretized distribution shown here is a good reflection of his prior knowledge Expected value is about 1.0 Distribution is heavy tailed on the right P(Λ 0.8) = 0.54 P(Λ 1.6) = 0.87 Values of Λ greater than 3 are unlikely enough to ignore Unit 2 (v4)

48 Bayesian Inference for Error Rate Unit 2 (v4)

49 Features of the Posterior Distribution Central tendency Posterior mean of L is 0.87 Prior mean of L is 0.97; data mean is.83 Typically posterior central tendency is a compromise between the prior distribution and the center of the data Variation Posterior standard deviation of L is about.33 Prior standard deviation of L is about.62 Typically variation in the posterior is less than variation in the prior (we have more information) Meeting the threshold Posterior probability of meeting or doing better than design goal is about.58 Posterior probability that new system is better than old system is about.96 Posterior probability that new system is worse than old system is less than.02 Unit 2 (v4)

50 Triplot Visual tool for examining Bayesian belief dynamics Plot prior distribution, normalized likelihood, and posterior distribution Normalized likelihood: Divide likelihood by sum or integral over l Posterior distribution we would obtain if all values were assigned equal prior probability Unit 2 (v4)

51 Bayesian Belief Dynamics: Sequential and Batch Processing of Observations Batch processing: Use Bayes rule with prior g(θ) and combined likelihood f(x 1,, X n θ) to find posterior g(θ X 1,, X n ) Sequential processing: Use Bayes rule with prior g(θ) and likelihood f(x 1 θ) to find posterior g(θ X 1 ) Use Bayes rule with prior g(θ X 1 ) and likelihood f(x 2 θ) to find posterior g(θ X 1, X 2 ) Use Bayes rule with prior g(θ X 1,, X n-1 ) and likelihood f(x n θ) to find posterior g(θ X 1,, X n ) The posterior distribution after n observations is the same with both methods Unit 2 (v4)

52 Visualizing Bayesian Belief Dynamics: Hour-by-Hour Triplots for Transmission Error Data Unit 2 (v4)

53 Fundamental Identity of Bayesian Inference Likelihood function f(x θ)g(θ) = f(x)g(θ x) Prior density function Marginal likelihood The joint gpdf of parameter and data can be expressed in two different ways: Prior density for parameter times likelihood function Marginal likelihood times posterior density for parameter Posterior density function» Marginal likelihood is (conditional) likelihood integrated over parameter f x = θ g θ dμ(θ)= E θ g θ dθ F f(x θ g θ continuous parameter discrete Unit 2 (v4)

54 Marginal Likelihood Before we have seen X, we use the marginal likelihood to predict the value of X When used for predicting X, the marginal likelihood is called the predictive distribution for X After we see X=x, we divide the joint probability f(x θ)g(θ) by the marginal likelihood f(x) to obtain the posterior probability of θ: g(θ x) = f (x θ)g(θ) f (x) The marginal likelihood f(x) is the normalizing constant in Bayes Rule we divide by f(x) to ensure that the posterior probabilities sum to 1 The marginal likelihood f(x) = Σ θ f(x θ)g(θ) includes uncertainty about both θ and x given θ Non-Bayesians sometimes predict future observations using a point estimate of θ Predictions using the marginal likelihood are more spread out (include more uncertainty) than predictions using a point estimate of θ Unit 2 (v4)

55 Predicting Future Observations Compare Marginal Likelihood with Poisson(E[Λ observations so far]) Unit 2 (v4)

56 Parameters and Conditioning: Frequentists and Subjectivists Frequentist view of parameters Parameter Θ represents a true but unknown property of a random data generating process It is not appropriate to put a probability distribution on Θ because the value of Θ is fixed, not random Subjectivist view of parameters A subjectivist puts a probability distribution on Θ as well as X i given Θ Parametric distributions are a convenient means to specify probability distributions that represent our beliefs about as yet unobserved data Many subjectivists don t believe true parameters exist Frequentists and subjectivists on conditioning Frequentists condition on parameter and base inferences on data distribution f(x 1 Θ) f(x n Θ)» Even after X has been observed it is treated as random Subjectivists condition on knowns and treat unknowns probabilistically» Before observing data X=x, the joint distribution of parameters and data is f(x 1 Θ) f(x n Θ)g(Θ)» After observing data, X=x is known and Θ has distribution g(θ x 1,,x n ) Unit 2 (v4)

57 Connecting Subjective and Frequency Probability: de Finetti s Representation Theorem IF Your beliefs are represented by a probability distribution P over a sequence of events X 1, X 2, You believe the sequence is infinitely exchangeable (your probability for any sequence of successes and failures does not depend on the order of successes and failures) THEN You believe with probability 1 that the proportion of successes will tend to a definite limit as the number of trials goes to infinity S n where S n = # successes in n trials n p as n Your probability distribution for S n is the same as the distribution you would assess if you believed the observations were random draws from some unknown true value of p with probability density function f(p) P(S n = k) = 1 0 n k p k (1 p) n k g(p)dp P A sequence a frequentist would call random draws from a true distribution is one a Bayesian would call exchangeable X i i=1, Unit 2 (v4)

58 Bayesian Inference for Continuous Random Variables Inference for continuous random variable is limiting case of inference with discretized random variable as number of bins and width of bin goes to zero Accuracy tends to increase with more bins Accuracy tends to increase with smaller area per bin Be careful with tail area of unbounded random variables Closed form solution for continuous problem exists in special cases (see Unit 3) The prior distribution we used for the transmission error example was obtained by discretizing a Gamma distribution In Unit 3, we will compare with the exact result using the Gamma prior distribution Unit 2 (v4)

59 Summary and Synthesis A random variable represents an uncertain hypothesis Categorical, ordinal, discrete numerical, continuous numerical A probability distribution maps sets of possible values to numbers Probability of a set indicates how likely it is that the value belongs to the set Probability mass functions, density functions, and cumulative distribution functions are tools for calculating probabilites of sets We examined measures of central tendency and spread Parametric families of distributions are convenient and practical off the shelf models for common types of uncertain processes We listed several commonly used parametric families We noted that observations are often modeled as randomly sampled from a parametric family with unknown parameter We applied Bayes rule to infer a posterior distribution for a parameter (using a discretized prior distribution) We contrasted the subjectivist and frequentist approaches to parameter inference We saw that de Finetti s exchangeability theorem provides a connection between the frequentist and subjectivist views of parameters Unit 2 (v4)

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014 Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions

More information

Learning Objectives for Stat 225

Learning Objectives for Stat 225 Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:

More information

Math Review Sheet, Fall 2008

Math Review Sheet, Fall 2008 1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the

More information

1 Probability and Random Variables

1 Probability and Random Variables 1 Probability and Random Variables The models that you have seen thus far are deterministic models. For any time t, there is a unique solution X(t). On the other hand, stochastic models will result in

More information

15 Discrete Distributions

15 Discrete Distributions Lecture Note 6 Special Distributions (Discrete and Continuous) MIT 4.30 Spring 006 Herman Bennett 5 Discrete Distributions We have already seen the binomial distribution and the uniform distribution. 5.

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Chapter 5. Chapter 5 sections

Chapter 5. Chapter 5 sections 1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015 Probability Refresher Kai Arras, University of Freiburg Winter term 2014/2015 Probability Refresher Introduction to Probability Random variables Joint distribution Marginalization Conditional probability

More information

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes. A Probability Primer A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes. Are you holding all the cards?? Random Events A random event, E,

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

STAT Chapter 5 Continuous Distributions

STAT Chapter 5 Continuous Distributions STAT 270 - Chapter 5 Continuous Distributions June 27, 2012 Shirin Golchi () STAT270 June 27, 2012 1 / 59 Continuous rv s Definition: X is a continuous rv if it takes values in an interval, i.e., range

More information

2 Belief, probability and exchangeability

2 Belief, probability and exchangeability 2 Belief, probability and exchangeability We first discuss what properties a reasonable belief function should have, and show that probabilities have these properties. Then, we review the basic machinery

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

Class 26: review for final exam 18.05, Spring 2014

Class 26: review for final exam 18.05, Spring 2014 Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Random variables. DS GA 1002 Probability and Statistics for Data Science. Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Motivation Random variables model numerical quantities

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Review of Probabilities and Basic Statistics

Review of Probabilities and Basic Statistics Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to

More information

Inference for a Population Proportion

Inference for a Population Proportion Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability

More information

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com 1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics, Appendix

More information

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2017 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Chapter 2. Continuous random variables

Chapter 2. Continuous random variables Chapter 2 Continuous random variables Outline Review of probability: events and probability Random variable Probability and Cumulative distribution function Review of discrete random variable Introduction

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

A Few Special Distributions and Their Properties

A Few Special Distributions and Their Properties A Few Special Distributions and Their Properties Econ 690 Purdue University Justin L. Tobias (Purdue) Distributional Catalog 1 / 20 Special Distributions and Their Associated Properties 1 Uniform Distribution

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables To be provided to students with STAT2201 or CIVIL-2530 (Probability and Statistics) Exam Main exam date: Tuesday, 20 June 1

More information

Continuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( ) Chapter 4 4.

Continuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( ) Chapter 4 4. UCLA STAT 11 A Applied Probability & Statistics for Engineers Instructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology Teaching Assistant: Christopher Barr University of California, Los Angeles,

More information

Continuous random variables

Continuous random variables Continuous random variables Continuous r.v. s take an uncountably infinite number of possible values. Examples: Heights of people Weights of apples Diameters of bolts Life lengths of light-bulbs We cannot

More information

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying

More information

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017 Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries

More information

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES VARIABLE Studying the behavior of random variables, and more importantly functions of random variables is essential for both the

More information

Lecture 1: Bayesian Framework Basics

Lecture 1: Bayesian Framework Basics Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of

More information

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations Chapter 5 Statistical Models in Simulations 5.1 Contents Basic Probability Theory Concepts Discrete Distributions Continuous Distributions Poisson Process Empirical Distributions Useful Statistical Models

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Time Series and Dynamic Models

Time Series and Dynamic Models Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The

More information

Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University

Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University Probability theory and statistical analysis: a review Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University Concepts assumed known Histograms, mean, median, spread, quantiles Probability,

More information

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist

More information

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr. Topic 2: Probability & Distributions ECO220Y5Y: Quantitative Methods in Economics Dr. Nick Zammit University of Toronto Department of Economics Room KN3272 n.zammit utoronto.ca November 21, 2017 Dr. Nick

More information

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification Michael Anderson, PhD Hélène Carabin, DVM, PhD Department of Biostatistics and Epidemiology The University of Oklahoma

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Single Maths B: Introduction to Probability

Single Maths B: Introduction to Probability Single Maths B: Introduction to Probability Overview Lecturer Email Office Homework Webpage Dr Jonathan Cumming j.a.cumming@durham.ac.uk CM233 None! http://maths.dur.ac.uk/stats/people/jac/singleb/ 1 Introduction

More information

Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

More information

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section

More information

Northwestern University Department of Electrical Engineering and Computer Science

Northwestern University Department of Electrical Engineering and Computer Science Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Statistics and Data Analysis

Statistics and Data Analysis Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli Statistics and Data

More information

Examples of random experiment (a) Random experiment BASIC EXPERIMENT

Examples of random experiment (a) Random experiment BASIC EXPERIMENT Random experiment A random experiment is a process leading to an uncertain outcome, before the experiment is run We usually assume that the experiment can be repeated indefinitely under essentially the

More information

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming

More information

Brief Review of Probability

Brief Review of Probability Maura Department of Economics and Finance Università Tor Vergata Outline 1 Distribution Functions Quantiles and Modes of a Distribution 2 Example 3 Example 4 Distributions Outline Distribution Functions

More information

Probability. Lecture Notes. Adolfo J. Rumbos

Probability. Lecture Notes. Adolfo J. Rumbos Probability Lecture Notes Adolfo J. Rumbos October 20, 204 2 Contents Introduction 5. An example from statistical inference................ 5 2 Probability Spaces 9 2. Sample Spaces and σ fields.....................

More information

Introduction to Bayesian Inference

Introduction to Bayesian Inference Introduction to Bayesian Inference p. 1/2 Introduction to Bayesian Inference September 15th, 2010 Reading: Hoff Chapter 1-2 Introduction to Bayesian Inference p. 2/2 Probability: Measurement of Uncertainty

More information

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models Fatih Cavdur fatihcavdur@uludag.edu.tr March 20, 2012 Introduction Introduction The world of the model-builder

More information

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D.

1 A simple example. A short introduction to Bayesian statistics, part I Math 217 Probability and Statistics Prof. D. probabilities, we ll use Bayes formula. We can easily compute the reverse probabilities A short introduction to Bayesian statistics, part I Math 17 Probability and Statistics Prof. D. Joyce, Fall 014 I

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation. PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.. Beta Distribution We ll start by learning about the Beta distribution, since we end up using

More information

Probability theory basics

Probability theory basics Probability theory basics Michael Franke Basics of probability theory: axiomatic definition, interpretation, joint distributions, marginalization, conditional probability & Bayes rule. Random variables:

More information

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory [POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

More information

3 Continuous Random Variables

3 Continuous Random Variables Jinguo Lian Math437 Notes January 15, 016 3 Continuous Random Variables Remember that discrete random variables can take only a countable number of possible values. On the other hand, a continuous random

More information

Probability and Probability Distributions. Dr. Mohammed Alahmed

Probability and Probability Distributions. Dr. Mohammed Alahmed Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about

More information

Slides 8: Statistical Models in Simulation

Slides 8: Statistical Models in Simulation Slides 8: Statistical Models in Simulation Purpose and Overview The world the model-builder sees is probabilistic rather than deterministic: Some statistical model might well describe the variations. An

More information

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008 Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008 1 Review We saw some basic metrics that helped us characterize

More information

Bayesian Inference and Decision Theory

Bayesian Inference and Decision Theory Bayesian Inference and Decision Theory Instructor: Kathryn Blackmond Laskey Room 224 ENGR (703) 993-644 Office Hours: Thursday 4:00-6:00 PM, or by appointment Spring 208 Unit 3: Bayesian Inference with

More information

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019 Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial

More information

(3) Review of Probability. ST440/540: Applied Bayesian Statistics

(3) Review of Probability. ST440/540: Applied Bayesian Statistics Review of probability The crux of Bayesian statistics is to compute the posterior distribution, i.e., the uncertainty distribution of the parameters (θ) after observing the data (Y) This is the conditional

More information

Sample Spaces, Random Variables

Sample Spaces, Random Variables Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

More information

Chapter 5. Bayesian Statistics

Chapter 5. Bayesian Statistics Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective

More information

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* 3 Conditionals and marginals For Bayesian analysis it is very useful to understand how to write joint, marginal, and conditional distributions for the multivariate

More information

Advanced topics from statistics

Advanced topics from statistics Advanced topics from statistics Anders Ringgaard Kristensen Advanced Herd Management Slide 1 Outline Covariance and correlation Random vectors and multivariate distributions The multinomial distribution

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Continuous Distributions

Continuous Distributions Chapter 3 Continuous Distributions 3.1 Continuous-Type Data In Chapter 2, we discuss random variables whose space S contains a countable number of outcomes (i.e. of discrete type). In Chapter 3, we study

More information

Stat 5101 Notes: Brand Name Distributions

Stat 5101 Notes: Brand Name Distributions Stat 5101 Notes: Brand Name Distributions Charles J. Geyer September 5, 2012 Contents 1 Discrete Uniform Distribution 2 2 General Discrete Uniform Distribution 2 3 Uniform Distribution 3 4 General Uniform

More information

Probability Theory. Patrick Lam

Probability Theory. Patrick Lam Probability Theory Patrick Lam Outline Probability Random Variables Simulation Important Distributions Discrete Distributions Continuous Distributions Most Basic Definition of Probability: number of successes

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Probability. Table of contents

Probability. Table of contents Probability Table of contents 1. Important definitions 2. Distributions 3. Discrete distributions 4. Continuous distributions 5. The Normal distribution 6. Multivariate random variables 7. Other continuous

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Summarizing Measured Data

Summarizing Measured Data Performance Evaluation: Summarizing Measured Data Hongwei Zhang http://www.cs.wayne.edu/~hzhang The object of statistics is to discover methods of condensing information concerning large groups of allied

More information

Chapter 1. Sets and probability. 1.3 Probability space

Chapter 1. Sets and probability. 1.3 Probability space Random processes - Chapter 1. Sets and probability 1 Random processes Chapter 1. Sets and probability 1.3 Probability space 1.3 Probability space Random processes - Chapter 1. Sets and probability 2 Probability

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved. 1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information