Densiy fi (con.) Lecure 4 The aim of his lecure is o improve our abiliy of densiy fi and knowledge of relaed opics. Main issues relaed o his lecure are: logarihmic plos, survival funcion, HS-fi mixures, melange random number generaion from arbirary disribuion Pareo disribuions, heavy ail fi disribuion of maxima; POT heory chi square es; confidence lines in Q-Q plo, KS es Righ ail. Survival funcion Simulaions Sar wih he simulaions of R suggesed in he file comandi_lezione4_9_doing. They presens log, log-log, ieraed log plos, of Piro and Medi daa. Among he mos ineresing: Piro daa wihou oulier, wih linear regression line: SL 2 3 5 5 2 25 3 35 D Medi daa, ieraed log plo, log also in x, wih regression line: SLL.5..5 2. 3.7 3.8 3.9 4. 4. DL Definiion of Survival funcion Survival or reliabiliy funcion: S PX F.
I measures he degree of survival of a uni, or he degree or reliabiliy of a uni. Larger S, greaer probabiliy of survival. Example: Weibull, shape a, scale s: (S for ). Graph for a 2, s : S e s a,.8.6.4 A few rules: 2 3 4 5 S f Sxdx if lim x xsx EX 2 2 xsxdx if lim x x 2 Sx (and hen VarX EX 2 2 ). Indeed, xfxdx xs xdx xsx Sxdx EX 2 x 2 fxdx x 2 S xdx x 2 Sx 2xSxdx. Technical advanage for us: i goes o zero (F goes o ) as, hence i is easier o analyze he order of convergence. Logarihmic survival funcion I is: Weibull: logs. logs s a Graph for a 2, s :
25 2 5 5 2 3 4 5 The logarihmic plo allows us o see wheher he shape parameer of a Weibull is (super-exponenial decay) or no. If (power decay, or sub-exponenial) S hen hence we see a graph of he form logs log, log 3 2.5 2.5.5 2 3 4 5 hence i is convenien o plo i in log-log coordinaes (log on boh axes): if we ge a line x log logs x 9 8 7 6 5 4 3 2 3 4 5 x Ieraed logarihm I is: Weibull: if x log. loglogs. loglogs alog s x Summary If logs is a line, hen i is exponenial decay (like exponenial disribuion,
Gamma, logisic). If logs is parabolic, hen ry ieraed logarihm agains logarihm (like Weibull wih shape, or Gaussian) If logs is logarihmic, hen ry logs agains logarihm (like power laws, Weibull wih shape, log-normal, Pareo). HS-fi of a ail Assume ha we discover wih x log. Then Assume ha we discover wih x log. Then logs x S e log e. loglogs x logs e log e For insance, i may be Weibull wih S e e exp e /. shape scale e /. This elemenary mehod of fi is called Hisorical Simulaion (HS) mehod. Exercise Apply HS-fi o Medi daa. Use only a suiable righ segmen of daa. Compare wih ML fi by Q-Q plo. Noice ha he righ-half is improved (he lef-half is worse). Exercise Fi Weibull o Piro daa wihou oulier. Shape is no! Check Q-Q plo and ieraed logarihm. Problem We have powerful ools o fi subses of a sample. How o merge he resuls in a single fi? Le us describe wo mehods: mixures melange. Mixures Given wo r.v. X,X, given a discree r.v. K wih values,, all independen, he new r.v. Y X K
is called mixure of X,X 2. To simulae a mixure choose firs a random number k under K, hen produce a random number under X k. Producing random numbers from a mixure is hus rivial. Use rbinom(n,, p) o generae n nunbers equal o or, wih frequency p of s. If X,X 2 have densiy f,f 2 and hen Y has densiy PK p f Yx pf x pf x. Indeed, by a version of facorizaion formula for he expecaion, EY EX K K PK EX K K PK pex pex p xfxdx p xfxdx Similarly xfyxdx. F Yx pf x pf x S Yx ps x ps x. Everyhing exends o he mixure of N random variables. For insance, he formula for he densiy has he form N f Yx p k f kx. k Example. Le us mix wo Gaussians, X N, wih X N5,, p.7: x5 fx.7 2 2 e 2.3 2 e x2 2 5.5..5-2 2 4 x 6 8 This is our firs example of mulimodal densiy. When he densiies are very close, we do no have mulimodaliy anymore. Always wih X N,,p.7:
.3 5.5..5 5.5..5-2 2 4 x 6 8-2 2 4 x 6 8 X N2, X N3, Physically, behind mixures i could be ha here are wo phenomena and we observe someimes one, someimes he oher. Presumably, Piro daa are a mixure. The smalles 8 daa are very homogeneous and hus hey come presumably from he same disribuion; i could be a Weibull. The larges one, he oulier, comes from anoher disribuion. Unforunaely, we canno fi a disribuion by means of a single daa. Operaions wih mixures: Random number generaion: K-rbinom(,,.7) X-marix(nrow ) X-rnorm(,mean, sd) X-rnorm(,mean5, sd) X-K*X(-K)*X his(x,) Hisogram of X Frequency 5 5 2 25-4 -2 2 4 6 8 X cdf: simply use he formula F Yx pf x pf x quaniles: by Mone Carlo, q.95 is K-rbinom(,,.7) X-marix(nrow ) X-rnorm(,mean, sd) X-rnorm(,mean5, sd) X-K*X(-K)*X sor(x)[95] or by numerical inversion of F (see below for melange). Difficulies wih mixures: fi he parameers avoid superposiion of ails. Melange We describe here a varian of he concep of mixure, beer if we wan o connec wo pieces of cdf (ypically he righ and he lef ails). Consider he funcion
arcan The graph for mid 5 and 2 is: mid 2..8.6.4 2 4 6 8 Given wo cdf F and F, define F F F F F. For mid we have, hence F. Vice versa, For mid we have, hence F. Moreover,, he limis a infiniy are and. Very ofen i happens ha is increasing, and hus i is a cdf (check numerically he derivaive: i mus be posiive). I is a smooh cdf inerpolaion beween F and F. [Noice ha F F F F. The ermf F is posiive. If F F, also he ermf F is posiive, and hus is a cdf. Oherwise, if F F, i may happen ha is decreasing in some region, bu i may also happen ha here is a compensaion wih he posiive erms: noice haf F is small excep maybe near mid, and i is small also near mid if F F ; and near mid he ermf F may be large. So, in pracice, i may easily happen ha everywhere.] Random number generaion Problem: how o generae random numbers when F is no provided by R? Theorem If X is a random variable wih cdf F (coninuous case), hen he random variable is uniformly disribued on,. Proof hence U is uniform. U : FX PU PFX PX F FF Corollary If U is a uniform random variable hen is a random variable wih cdf F. Algorihm: F U
generae a uniform random number u compue F u. In pracice, we need quaniles associaed o F. Pareo Disribuion m, (index) Survival funcion equal o: S m for m for m. Comparaive plo of Pareo ( m, 2, black line) and Weibull ( 2, s 2, red line): slow decay of Pareo..8.6.4 2 3 4 5 Pareo is s model of heavy ail disribuion. Logarihmic plo of survival funcion: log log m Comparison Pareo ( m, 2, black line) and Weibull ( 2, s 2, red line): 6 5 4 3 2 2 3 4 5 Densiy: f m m for m for m mean and sandard deviaion: m, m 2. Sofware R: no Pareo in he main packages. Bu we may easily do everyhing.
ppareo, dpareo, qpareo The mail problem is ha m, so we need o include F for m in he analyic formula. The indicaor funcion of he half line, is he Heaviside funcion H - funcion(x) {(sign(x))/2} The indicaor funcion of he half line m, is he ranslaed Heaviside funcion H.m - funcion(x,m) {(sign(x-m))/2} Then we se: ppareo - funcion(,m,alp) {(-(m/)^alp)*h.m(,m)} which gives us he plo (for alp-2, m-) ppareo(, m, alp)..4.6.8 2 3 4 5 Quaniles can be compued analyically: Fq, m q, m q / q m / qpareo - funcion(a,m,alp) {m*(-a)^(-/alp)} Parameer esimaion We could wrie ML. Bu le us use momens: from m, m 2 we deduce 2, hence 2 2 2, m 2 2. Exercise Fi Piro daa by a mixure of a Weibull (for 8 good daa) and a Pareo (for he oulier). The choice of he Pareo is very subjecive. Plo cdf, densiy, generaes samples of 9 daa, check Q-Q plo, compare wih oher Q-Q plos, maybe adjus Pareo parameers by means of he Q-Q plo.