Modèles stochastiques II - PDF Free Download

Modèles stochastiques II INFO 154 Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 1 http://ulbacbe/di Modéles stochastiques II p1/50

The basics of statistics Statistics starts ith a problem, continues ith the collection of data, proceeds ith the data analysis, and finishes ith conclusions Phenomenon: eg the eather in Belgium; Variables: eg the temperature Observations: eg the historical recording of temperature in the last ten years Data: formatted version of observations (management of data entry errors, missing data, re-normalization, scaling, coding) Data analysis methods: eg, time series prediction Inferences or conclusions: eg the temperature s planet is gradually getting hotter Modéles stochastiques II p/50

Variability in measurements Consider an unknon and discrete scalar variable, (eg the age of a student inscribed to the last year of Computer Science) If e measures times this variable e realize that this variable if affected by uncertainty We often use simple graphical methods to assist in analyzing the data from an experiment Dot diagram: this enables to see quickly the general location or central tendency of the observations and their spread useful for displaying a small body of data Histogram: it shos the central tendency, spread and general shape of the distribution of a single variable starting from a sample data It is constructed by dividing the horizontal axis into intervals (usually of equal length) and draing a rectangle over the th interval ith area of the rectangle proportional to, the number of observations that fall in that Modéles stochastiques II p/50

Visualization of variability Set of measurements Dot diagram 5 Histogram 45 4 5 0 5 15 1 05 15 0 5 0 0 15 1 1 18 19 0 1 4 5 8 9 0 The histogram is constructed by dividing the horizontal axis into intervals To see the effect of the number on the visualization of measures, run the MATLAB script s_histm Modéles stochastiques II p4/50

Table representation For a large number of variables, the graphical representation is uneasy In this case, tables are currently used Samples Variables If all the variables are quantitative, the table is a matrix Each line represents a sample (eg an individual) for hich e have measures different features (eg student s age, student s height, student s income) Each column represents the different measures collected for a specific variable (the ages of the students in the classroom) Modéles stochastiques II p5/50

Probabilistic interpretation of uncertainty This course ill assume that the variability of measurements can be represented by the probability formalism In the case of our discrete variable discrete random variable, e assume that this is a A discrete random variable is a numerical quantity, linked to some experiment involving some degree of randomness, that takes its value from some discrete set of possible values Example: the experiment might be the rolling of to six-sided dice and the rv might be the sum of the to numbers shoing in the dice In this case the set of possible values are In the example of the age, the random experiment is a compact (and approximate) ay of modeling the disparate set of causes hich led to variability in the value of Modéles stochastiques II p/50

# Probability distribution of a discrete rv The probability distribution of a discrete rv 1 the set sample space) is the combination of of values that this rv can take (also called range or the set of probabilities associated to each value of This means that e can attach to the random variable some specific mathematical function that gives for each the probability that assumes the value #'& Prob Modéles stochastiques II p/50

Probability distribution of a discrete rv(ii) For a reduced number of possible values of, the probability distribution can be presented in the form of a table For example, if e plan to toss a coin tice, and the random variable is the number of heads that eventually turn up, the probability distribution can be presented as follo Values of the random variable 0 1 Associated probabilities 05 050 05 Modéles stochastiques II p8/50

) ) ) ) ) ) ) ) Parametric distribution function Suppose that 1 is a discrete rv that takes its value in the probability distribution of is - + #(&,+ here is some fixed non zero real number Whatever the value of, for and Therefore is a ell-defined random variable, even if the value of is unknon We call a parameter, that is some constant, usually unknon involved in a probability distribution # & + + #(& #(& / # & Modéles stochastiques II p9/50

10 5 4 10 10 1 5 4 Mean and variance of a discrete rv Mean (or expected value) of a discrete random variable is defined as # & The ord average is not a synonymous of the ord mean The mean is not necessarily a value that belongs to Variance of a discrete random variable is defined as Var 8 10 9 # & The variance is a measure of the dispersion of the probability distribution of the random variable around its mean Modéles stochastiques II p10/50

Probability distributions 50 0 0 80 90 100 110 10 10 140 150 50 0 0 80 90 100 110 10 10 140 150 10 To discrete rv distributions having the same mean but different variance Modéles stochastiques II p11/50

1 1 : : 10 ; 10 > Std deviation and moments of a discrete rv Standard deviation of a discrete random variable positive square root of the variance is defined as the Std Moment: for any positive integer, the distribution about the mean is Var th moment of the probability #=& ; <; Skeness of a discrete random variable is defined as?? Distributions ith positive skeness have long tails to the right, and distributions ith negative skeness have long tails to the left Modéles stochastiques II p1/50

0 C B 10 0 0 B 10 1 C B 1 1 C B @ 1 Linear combinations The expectation value of a linear combination of rv s is simply the linear combination of their respective expectation values 1 D + @ CED + B 1A@ ie, expectation is a linear statistic Since the variance is not a linear statistic, e have Var + B 1 @ Var @ CED + Var DB 10 C @ + 1 D 1 D Var + Var Cov C @ + 1 D D B here Cov D B is called covariance Modéles stochastiques II p1/50

1 F 1 B G F G D B 1 B The correlation coefficient is Correlation D B Cov Var D B Var 1 D It is easily shon that If and Cov D B D B are to independent random variables then In general, if the random variables then BH B are independent, Var B K H JI K H JI Modéles stochastiques II p14/50

# N 5 4 O P O N O # Entropy Given a discrete rv defined by, the entropy of its probability distribution is # & LEM #(& is a measure of the unpredictability of a rv If there are if #'& possible values for a rv and takes the value LEM The entropy is minimized if only one value of, the entropy is maximized is possible Since depends only on the probabilities of the various values of and not on the actual values themselves, it can be thought as a function of the probability distribution rather than of Although entropy and variance measure in some sense the uncertainty of a rv, the entropy has a different interpretation since it depends only on the probabilities of the different values and not on the values themselves Modéles stochastiques II p15/50

N N V Relative entropy Let us consider to different discrete probability distributions on the same set of values # &ST # #(&SR # Q here if and only if The relative entropies associated ith these to distributions are / # Q / # # # Q LEM # # Q U U # L M # Q # U U # Q # Q # These quantities measure the dissimilarity beteen the to distributions In order to satisfy the reciprocity requirement the divergence quantity # Q U U # + # U U # Q # #XW is typically used Modéles stochastiques II p1/50

# # # ^ ^ Marginal probability Consider a probabilistic model described by variables discrete random A fully specified probabilistic model gives the joint probability for every combination of the values of the rvs The model is specified by the values of the probabilities Prob <Y Y Y for every possible assignment of values Y to the variables Marginal probabilities for subsets of the variables can be found by summing over all possibile combinations of values for the other variables ` Y a ` <Z <Z ^ [ [ <Z J_ T ] J\ Modéles stochastiques II p1/50

# # # Conditional probability Conditional probabilities for one subset of variables given values for another disjoint subset here, are defined as ratios of marginal probabilisties g e f e e c Ab e d c Ab e c b e d c Ab e c b U e d c Ab e c Ab Modéles stochastiques II p18/50

Marginal/conditional: example Consider a probabilistic model of the day s eather based on discrete variables, here 1 represents the sky condition and takes value in the finite set {CLEAR, CLOUDY} represents the barometer trend and takes value in the finite set {RISING,FALLING}, represents the humidity in the afternoon and takes value in {DRY,WET}? Modéles stochastiques II p19/50

i h i h s qr k j # o n e n m lm 0 k j # p t o n e n m lm 0 k Uj s m # o n e n m lm 0 k j s m # t v u o n e n m lm 0 k j # p Marginal/conditional: example (II) Let the joint distribution be given by the table P( CLEAR RISING DRY 04 CLEAR RISING WET 00 CLEAR FALLING DRY 008 CLEAR FALLING WET 010 CLOUDY RISING DRY 009 CLOUDY RISING WET 011 CLOUDY FALLING DRY 00 CLOUDY FALLING DRY 01 From the joint distribution e can calculate the marginal probabilities and and the conditional value,, ) Modéles stochastiques II p0/50

1 10 Bernoulli trial A Bernoulli trial is a single trial ith to possible outcomes, often called success and failure The probability of success is denoted by failure by A Bernoulli random variable Bernoulli trial It takes probability The probability distribution of and the probability of is a discrete rv associated ith the ith probability and ith can be ritten in the form Prob # & yx Note that and Var Modéles stochastiques II p1/50

x The Binomial distribution A binomial random variable is the number of successes in a fixed number of independent Bernoulli trials ith the same probability of success for each trial Example: the number of heads in The probability distribution is given by tosses of a coin Prob # & The mean of the distribution is The Bernoulli distribution is a special case ( distribution For small, the probability of having at least is proportional to, as long as is small ) of the binomial success in trials Modéles stochastiques II p/50

a H The Geometric distribution A rv has a geometric distribution if it represents the number of successes before the first failure in a sequence of independent Bernoulli trials ith probability of success Its probability distribution is #'& A rv has a generalized geometric distribution if it represents the number of Bernoulli trials preceding but not including the th failure Its distribution is z + z + z + z H x z # & Modéles stochastiques II p/50

/ { { } x ~ { { } x 1 { { 10 The Poisson distribution A rv has a Poisson distribution ith parameter is #(& The Poisson distribution is a limiting form of the binomial distribution If the number of trials is large, the probability of success of each trial is small and the product is moderate, then the binomial distribution of successes is very close to the probability that a Poisson random variable ith parameter takes the value Prob, Var Prob Modéles stochastiques II p4/50

Mean and variances of the distributions Distribution Mean Variance Bernoulli Binomial Geometric P P Poisson { { Modéles stochastiques II p5/50

G C Continuous random variable Continuous random variables take their value in some continuous range of values Consider a real random variable hose range is the set of real numbers The folloing quantities can be defined: Definition 1 The distribution function of is the function (& Prob (1) Definition The density function of a real random variable of the distribution function: is the derivative & (& () Probabilities of continuous rv are not allocated to specific values but rather to interval of values Specifically Prob @ & ƒ Modéles stochastiques II p/50

; 10 : Mean, variance, of a continuous rv Consider a continuous rv having range We can define and density function Mean: ˆ Variance: ˆ Other quantities of interest are the moments : ; ˆ ; The moment of order is the mean of Modéles stochastiques II p/50

G U U The Chebyshev s inequality Let be a generic random variable, discrete or continuous, having mean and variance The Chebyshev s inequality states that for any positive constant Prob An experimental validation of the Chebyshev s inequality can be found in the MATLAB file chebym Modéles stochastiques II p8/50

D B x D b B D B Bivariate probability distribution Let us consider to rv s b Ž Š Œ and We define marginal density the quantity and their joint density function b Ž Š Œ b We define conditional density the quantity b U b b hich is, in loose terms, the probability that assuming If and are independent is in about Ž U b b Š b Ž b Š b Ž Š Œ Modéles stochastiques II p9/50

C C C @ A random variable (also @ Uniform distribution @ is said to be uniformly distributed on the interval ) if its probability density function is given by ƒ x if otherise p(z) 1 b-a a b z Modéles stochastiques II p0/50

{ / { b b { P { P U + Exponential distribution A continuous random variable is said to be exponentially distributed ith rate (also ) if its probability density function is given by & } š x { if if The mean of The variance of is is It can be considered as continuous approximation to the geometric distribution Like the geometric distribution it satisfies the memoryless property Prob Prob It is used to describe physical phenomena (eg radioactive decay time or failure time) Modéles stochastiques II p1/50

{ / x 1 10 { The Gamma distribution We say that has a gamma distribution ith parameters ( density function is, ) if its { x, Var & ŸS œ Tž _œ Y }_ { x It is the distribution of the sum of distribution ith rate if otherise The exponential distribution is a special case ( gamma distribution iid rv having exponential ) of the Modéles stochastiques II p/50

«x t P B Normal distribution: the scalar case A continuous scalar random variable is said to be normally distributed ith parameters and (also ) if its probability density function is given by «ª œ «The mean of is ; the variance of The coefficient in front of the exponential ensures that The probability that an observation from a normal rv is ithin standard deviations from the mean is If and the distribution is defined standard normal We ill denote its distribution function Given a normal rv standard normal distribution is &, the rv B has a Modéles stochastiques II p/50

Important relations Prob G G + v Prob v G G + v v Prob t G G + t Prob G G + t Prob G G + t Prob G G + p Modéles stochastiques II p4/50

± ³ ³x ¹ µ 10 1 0 ³ P + Normal distribution: the multivariate case Let be a random vector The vector is said to be normally distributed ith parameters and (also ) if its probability density function is given by ²Y ³ Y ² Y 8 & ³ Éµ Y It follos that the mean is a -dimensional vector, the matrix is the covariance matrix This matrix is symmetric and therefore has parameters ¹ Modéles stochastiques II p5/50

³x ¹ º º d ³ d ³ Normal multivariate distribution (II) The quantity b b hich appears in the exponent of distance from to It can be shon that & is called the Mahalanobis the surfaces of constant probability density are hyperellipsoids on hich is constant; the principal axes are given by the eigenvectors hich satisfy,» of» {» here are the corresponding eigenvalues { the eigenvalues directions give the variances along the principal { MATLAB script s_gaussxyzm Modéles stochastiques II p/50

³ d Normal multivariate distribution (III) If the covariance matrix is diagonal then the contours of constant density are hyperellipsoids ith the principal diretions aligned ith the coordinate axes the components of are then statistically independent since the distribution of can be ritten as the product of the distributions for each of the components separately in the form Y JI & the total number of independent parameters in the distribution is if for all hyperspheres, the contours of constant density are Modéles stochastiques II p/50

¹ 1 ³ F F + µ F Bivariate normal distribution Consider a bivariate normal density hose mean is the covariance matrix is and ¾ ¼ ½ The correlation coefficient is It can be shon that the general bivariate normal density has the form F Modéles stochastiques II p8/50

v p t À t 1 ³ Bivariate normal distribution Let 05 0 05 0 p(z 1,z ) 015 01 005 0 0 50 40 0 z 0 10 0 0 10 0 0 40 50 0 z 1 Modéles stochastiques II p9/50

Bivariate normal distribution (prj) z u 1 u λ λ 1 z 1 Modéles stochastiques II p40/50

U U Á F + Á F F Á Á Marginal and conditional distributions One of the important properties of the multivariate normal density is that all conditional and marginal probabilities are also normal Using the relation e find that is a normal distribution Á, here Á Note that is a linear function of : if the correlation coefficient positive, the larger, the larger if there is no correlation beteen and value of and use alays to estimate is, e can ignore the Modéles stochastiques II p41/50

B Â Â Â G B Ã Â G Ã Â Â Â Ä x x Â Ã Ã Â x Â Â Upper critical points Definition The upper critical point of a continuous rv such that We ill denote ith z normal density Prob x x is the smallest number x the upper critical points of the standard Prob z z z Note that the folloing relations hold z z Â x z z «Å Here e list the most commonly used values of z z 01 005 005 005 001 0005 0001 00005 Æ 18 1440 145 19 5 090 91 Modéles stochastiques II p4/50

+ ^ ^ ^ + + + ^ ^ ^ + + 1 10 1 10 The sum of iid random variables Suppose that,,, are iid (identically and independently distributed) random variables, discrete or continuous, each having a probability distribution ith mean and variance Let us consider the to derived rv, that is the sum Ç and the average È The folloing relations hold Ç È Var Var È Ç See the MATLAB script sum_rvm Modéles stochastiques II p4/50

É Ê The central limit theorem Theorem 1 Assume that,,, are iid random variables, discrete or continuous, each having a probability distribution ith finite mean variance As, the standardized random variable and finite È hich is identical to Ç converges in distribution to a rv having the standardized normal distribution This result holds regardless of the common distribution of This theorem justifies the importance of the normal distribution, since many rv of interest are either sums or averages Modéles stochastiques II p44/50

Ë B + ^ ^ ^ B B 1 10 The chi-squared distribution For a positive integer, a rv has a distribution if + here, B,, B are iid random variables The probability distribution is a gamma distribution ith parameters and Var The distribution is called a chi-squared distribution ith degrees of freedom Modéles stochastiques II p45/50

The chi-squared distribution (II) 01 χ density: N=10 N 1 χ cumulative distribution: N=10 N 009 09 008 08 00 0 00 0 005 05 004 04 00 0 00 0 001 01 0 0 5 10 15 0 5 0 5 40 45 50 0 0 5 10 15 0 5 0 5 40 45 50 MATLAB script chisqm Modéles stochastiques II p4/50

Ì Ë D B Í B P D Student s -distribution If and -distribution ith are independent then the Student s degrees of freedom is the distribution of the rv We denote this ith Í Modéles stochastiques II p4/50

Ì Student s -distribution 04 Student density: N=10 1 Student cumulative distribution: N=10 05 09 08 0 0 05 0 0 05 015 04 0 01 0 005 01 0 5 4 1 0 1 4 5 0 5 4 1 0 1 4 5 MATLAB script s_stum Modéles stochastiques II p48/50

Notation In order to clarify the distinction beteen random variables and their values, e ill use the boldface notation for denoting a random variable (eg ) and the normal face notation for the eventually observed value (eg ) The notation denotes the probability that the random variable take the value The suffix indicates that the probability relates to the random variable This is necessary since e often discuss probabilities associated ith several random variables simultaneously # & Example: could be the age of a student before asking and could be his value after the observation Modéles stochastiques II p49/50

É Notation (II) In general terms, e ill denote as the probability distribution of a random variable any complete description of the probabilistic behavior of For example, if is continuous, the density function or the distribution function could be examples of probability distribution Given a probability distribution & the notation means that the dataset is a iid random sampled observed from the probability distribution Modéles stochastiques II p50/50