Non-informative, proper and improper priors. Statistical Data models, Non-parametrics, Dynamics. Dirichlet Distributionprior for discrete distribution

Size: px

Start display at page:

Download "Non-informative, proper and improper priors. Statistical Data models, Non-parametrics, Dynamics. Dirichlet Distributionprior for discrete distribution"

Clifford Cummings
5 years ago
Views:

Statistical Data models, Non-parametrics, Dynamics Non-informative, proper and improper priors For real quantity bounded to interval, standard prior is uniform distribution For real quantity,

1 Statistical Data models, Non-parametrics, Dynamics Non-informative, proper and improper priors For real quantity bounded to interval, standard prior is uniform distribution For real quantity, unbounded, standard is uniform - but with what density? For real quantity on half-open interval, standard prior is f(s)=1/s - but integral diverges! Divergent priors are called improper - they can only be used with convergent likelihoods Conjugate families Normal prior N(mu,s2) Normal likelihood N(mu,s2 ) Then posterior is normal N(mup,s2p), where (x-mu)^2/s2+(x-mu )^2/s2 =(x-mup)^2/s2p+c i.e., 1/s2+1/s2 =1/s2p mu/s2+mu /s2 =mup/s2p Unknown variances is more difficult Dirichlet Distributionprior for discrete distribution 1

Mean of Dirichlet - Laplaces estimator Occurence table

inference How to perform inference about a distribution

A distribution over reals can be approximated by a piecewise

2 Mean of Dirichlet - Laplaces estimator Occurence table probability Occurence table probability Non-parametric inference How to perform inference about a distribution without assuming a distribution family? A distribution over reals can be approximated by a piecewise uniform distribution a mixture of real distributions But how many parts? This is non-parametric inference 2

Mixture of Normals Mixture of Normals elimination of nuisance parameters

function [lh,lab,trlpost,trm,trstd,trlab,trct,nbounc]= mmnonu1(x,n,k,labi,nn);

%inputs % 1D mixture modelling, % x - 1D data column vector % N - iterations.

3 Mixture of Normals Mixture of Normals elimination of nuisance parameters Mixture of Normals elimination of nuisance parameters (AutoClass method) function [lh,lab,trlpost,trm,trstd,trlab,trct,nbounc]= mmnonu1(x,n,k,labi,nn); %[lh,lab,trlpost,trm,trstd,trlab,trct,nbounc]= % MMNONU1(x,N,k,labi,NN); %inputs % 1D mixture modelling, % x - 1D data column vector % N - iterations. % k - number of components %lab,labi - component labelling of data vector) % NN - thinning (optional) 3

4 function [lh,lab,trlpost,trm,trstd,trlab,trct,nbounc]= mmnonu1(x,n,k,labi,nn); %[lh,lab,trlpost,trm,trstd,trlab,trct,nbounc]= % MMNONU1(x,N,k,labi,NN); %outputs %trlh - thinned trace of log probability (optional) %trm - thinned trace of means vector (optional) %trstd - thinned vector of standard deviations (optional) %trlab - thinned trace of labels vector (size(x,1) by N/NN (optional) %trct - thinned trace of mixing proportions N=10000; NN=100; x=[randn(100,1);randn(100,1)*3;randn(100,1)+1]; % 3 components synthetic data k=2; labi=ceil(rand(size(x))*2); [llhc,lab2,trl,trm,trstd,trlab,trct,nbounc]= mmnonu1(x,n,k,labi,nn); [llhc2,lab2,trl2,trm2,trstd2,trlab2,trct2,nbounc]= mmnonu1(x,n,k,lab2,nn); (k=3, 4, 5) The three components Putting them together makes the identification seem harder. 4

5 std std Burn in progressing K=2: mean K=3: mean std Burnt in std No focus- No interpretation as 4 clusters K=3: mean K=4: Low prob mean 5

Trace of state labels std X sample: 1-100 : (0 1) 101:200: (0 3) 201:300:

linear prediction models exists For non-linear and Chaotic systems, method

6 Trace of state labels std X sample: : (0 1) 101:200: (0 3) 201:300: (1 1) K=5: Low prob mean Dynamic Systems, time series An abundance of linear prediction models exists For non-linear and Chaotic systems, method was developed in 1990:s (Santa Fe) Gershenfeld, Weigend: The Future of Time Series 6

Berry and Linoff have eloquently stated their preferences with the often quoted sentence: "Neural networks are a good choice for most classification problems when the results of the model are more

Dynamic Systems and Taken s Theorem Lag vectors (xi,x(i-1), x(i-t), for all i, occupy a submanifold of E^T, if T is large enough This manifold is diffeomorphic to original state space and can be used

7 Berry and Linoff have eloquently stated their preferences with the often quoted sentence: "Neural networks are a good choice for most classification problems when the results of the model are more important than understanding how the model works". Dynamic Systems and Taken s Theorem Lag vectors (xi,x(i-1), x(i-t), for all i, occupy a submanifold of E^T, if T is large enough This manifold is diffeomorphic to original state space and can be used to create a good dynamic model Taken s theorem assumes no noise and must be empirically verified. Neural networks typically give the right answer Dynamic Systems and Taken s Theorem Santa Fe 1992 Competition Unstable Laser Intensive Care Unit Data, Apnea Exchange rate Data Synthetic series with drift White Dwarf Star Data Bach s unfinished Fugue 7

8 Stereoscopic 3D view of state space manifold, series A (Laser) Hidden Markov Models Given a sequence of discrete signals xi Is there a model likely to have produced xi from a sequence of states si of a Finite Markov Chain? P(. s) - transition probability in state s S(. s) - signal probability in state s Speech Recognition, Bioinformatics Hidden Markov Models function [Pn,Sn,stn,trP,trS,trst,tll]= hmmsim(a,n,n,s,prop,po,so,sto,nn); %[Pn,Sn,stn,trP,trS,trst]=HMMSIM(A,N,n,s,prop,Po,So,sto,NN); % Compute trace of posterior for hmm parameters % A - the sequence of signals % N - the length of trace % n - number of states in Markov chain % s - number of signal values % prop - proposal stepsize % optional inputs: % Po - starting transition matrix (each of n columns a discrete pdf % in n-vector % So - starting signal matrix (each of n columns a discrete pdf 8

Hidden Markov Models Hidden Markov Models function [Pn,Sn,stn,trP,trS,trst,tll]= hmmsim(a,n,n,s,prop,po,so,sto,nn); % in s-vector % sto - starting state sequence (congruent to vector A) % NN -

9 Hidden Markov Models Hidden Markov Models function [Pn,Sn,stn,trP,trS,trst,tll]= hmmsim(a,n,n,s,prop,po,so,sto,nn); % in s-vector % sto - starting state sequence (congruent to vector A) % NN - thining of trace, default 10 % outputs % Pn - last transition matrix in trace % Sn - last signal emission matrix % stn - last hidden state vector (congruent to A) % trp - trace of transition matrices % trs - trace of signal matrices % trace of hidden state vectors Hidden Markov Models Hidden Markov Models 9

10 Particle filtergeneral tracking Chapman Kolmogorov version of Bayes rule f (! t D t ) " f (d t! t )# f (! t! t $1 ) f (! t $1 D t $1 )d! t$1 Observation and video based particle filter tracking Defence: tracking with heterogeneous observations Crowd analysis: tracking from video Cycle in Particle filter Time step cycle Importance (weighted) sample Resampled ordinary sample Diffused sample Weighted by likelihood X- state Z - Observation 10

Statistical Data models, Non-parametrics, Dynamics

Statistical Data models, Non-parametrics, Dynamics Non-informative, proper and improper priors For real quantity bounded to interval, standard prior is uniform distribution For real quantity, unbounded,