6. Advanced Numerical Methods. Monte Carlo Methods

Size: px

Start display at page:

Download "6. Advanced Numerical Methods. Monte Carlo Methods"

Brianna Austin
5 years ago
Views:

1 6. Advanced Numerical Methods Part 1: Part : Monte Carlo Methods Fourier Methods

2 Part 1: Monte Carlo Methods 1. Uniform random numbers Generating uniform random numbers, drawn from the pdf U[0,1], is fairly easy. Any scientific Calculator will have a RAN function p(x) Better examples of U[0,1] random number generators can be found in Numerical Recipes x

3 6.1. Uniform random numbers Generating uniform random numbers, drawn from the pdf U[0,1], is fairly easy. Any scientific Calculator will have a RAN function p(x) Better examples of U[0,1] random number generators can be found in Numerical Recipes. 1 In what sense are they better? Algorithms only generate pseudo-random 0 1 numbers: very long (deterministic) sequences of numbers which are approximately random (i.e. no discernible pattern). x The better the RNG, the better it approximates U[0,1]

4 We can test pseudo-random numbers for randomness in several ways: (a) Histogram of sampled values. We can use hypothesis tests to see if the sample is consistent with the pdf we are trying to model. e.g. Chi-squared test, applied to the to the numbers in each histogram bin. 0 1 x

5 We can test pseudo-random numbers for randomness in several ways: (a) Histogram of sampled values. We can use hypothesis tests to see if the sample is consistent with the pdf we are trying to model. e.g. Chi-squared test, applied to the to the numbers in each histogram bin. χ n bin i 1 n obs i n σ i pred i Assume the bin number counts are subject to Poisson fluctuations, so that pred σ i n i Note: no. of degrees of freedom n bin 1 since we know the total sample size. 0 1 x

6 (b) Correlations between neighbouring pseudo-random numbers Sequential patterns in the sampled values would show up as structure in the phase portraits scatterplots of the i th value against the (i+1) th value etc. (see Gregory, Chapter 5) x i+ x i+1 x i+1 x i x i

7 (b) Correlations between neighbouring pseudo-random numbers Sequential patterns in the sampled values would show up as structure in the phase portraits scatterplots of the i th value against the (i+1) th value etc. We can compute the Auto-correlation function ρ( j) [ ( xi x)( xi+ j x) ] ( xi x) ( xi+ j x) j is known as the Lag If the sequence is uniformly random, we expect ρ(j) 1 for j 0 {ρ(j) 0 otherwise

8 6.. Variable transformations Generating random numbers from other pdfs can be done by transforming random numbers drawn from simpler pdfs. The procedure is similar to changing variables in integration. Suppose, e.g. x ~ p( x) y y(x) Let be monotonic Then p ( y) dy p( x) dx p( y) p( x( y)) dy dx Probability of number between y and y+dy Probability of number between x and x+dx Because probability must be positive

9 We can extend the expression given previously to the case where y(x) is not monotonic, by calculating p ( y) dy p( x i ) dx i i so that p( y) i p( x dy i ( y)) dx i

10 Example 1 p(x) Suppose we have x ~ U[0,1] 1 Then p( x) 1 for 0 < x < 1 0 otherwise 0 1 x Define y a + ( b a) x p(y) So dy dx ( b a) 1 b-a i.e. p( y) b a for a < y < 0 otherwise 1 b or y ~ U[ a, b] 0 a b y

11 Example Numerical recipes provides a program to turn Normal pdf with mean zero and standard deviation unity x ~ U[0,1] into y ~ N[0,1] Suppose we want z ~ N[ µ, σ ] z µ + σ y We define so that dz dy σ Now 1 1 p( y) exp y π so p( z) 1 exp πσ 1 z µ σ

12 Example Numerical recipes provides a program to turn Normal pdf with mean zero and standard deviation unity x ~ U[0,1] into y ~ N[0,1] Suppose we want z ~ N[ µ, σ ] z µ + σ y We define so that dz dy σ Now 1 1 p( y) exp y π so p( z) 1 exp πσ 1 z µ σ Variable transformation formula also the basis for error propagation formulae we use in data analysis - see also SUPAIDA course

13 Question 13: x ~ U[0,1] y If and, the pdf of is y ln x A p ( y) y e B p( y) e y C p( y) ln y D p( y) ln y

14 Question 13: x ~ U[0,1] y If and, the pdf of is y ln x A p ( y) y e B p( y) e y C p( y) ln y D p( y) ln y

15 6.3. Probability integral transform One particular variable transformation merits special attention. Suppose we can compute the CDF of some desired random variable

16 1) Sample a random variable y ~ U[0,1] x y P(x) ) Compute such that i.e. x P 1 ( y)

17 1) Sample a random variable y ~ U[0,1] x y P(x) ) Compute such that i.e. x P 1 ( y)

18 1) Sample a random variable y ~ U[0,1] x y P(x) ) Compute such that i.e. x P 1 ( y) x ~ p( x) 3) Then i.e. is drawn from the pdf corresponding to the cdf x P(x)

19 Example (from Gregory)

20 6.4. Rejection sampling Suppose we want to sample from some pdf and we know that p 1 ( x) p ( x) y p1( x) < p( x) x p 1 ( x) 1) Sample from ) Sample x 1 p ( x ) p ( x )] y ~ U[0, 1 (Suppose we have an easy way to do this) x 1

21 6.4. Rejection sampling Suppose we want to sample from some pdf and we know that p 1 ( x) p ( x) y p1( x) < p( x) x p 1 ( x) 1) Sample from ) Sample x 1 p ( x ) p ( x )] y ~ U[0, 1 (Suppose we have an easy way to do this) x 1 y < p 1 ( x) 3) If ACCEPT otherwise REJECT

22 6.4. Rejection sampling (following Mackay) Suppose we want to sample from some pdf and we know that p 1 ( x) p ( x) y p1( x) < p( x) x p 1 ( x) 1) Sample from ) Sample y < x 1 p ( x ) p ( x )] y ~ U[0, 1 p 1 ( x) 3) If ACCEPT otherwise REJECT (Suppose we have an easy way to do this) x 1 Set of accepted values are a sample from p 1 ( x) { } x i

23 6.4. Rejection sampling p ( x) p 1 ( x) Method can be very slow if the shaded region is too large. Ideally we want to find a pdf p ( x) that is: (a) easy to sample from (b) close to p 1 ( x)

24 6.5. Genetic Algorithms (Charbonneau 1995)

25 6.5. Genetic Algorithms (Charbonneau 1995)

26 6.5. Genetic Algorithms (Charbonneau 1995)

27 6.5. Genetic Algorithms see

28 6.5. Genetic Algorithms see

29 6.6. Markov Chain Monte Carlo This is a very powerful, new (at least in astronomy!) method for sampling from pdfs. (These can be complicated and/or of high dimension). MCMC widely used e.g. in cosmology to determine maximum likelihood model to CMBR data. Angular power spectrum of CMBR temperature fluctuations ML cosmological model, depending on 7 different parameters. (Hinshaw et al 006)

30 Consider a -D example (e.g. bivariate normal distribution); Likelihood function depends on parameters a and b. Suppose we are trying to find the maximum of L(a,b) L(a,b) 1) Start off at some randomly chosen value ( a 1, b 1 ) ) Compute L( a 1, b 1 ) and gradient L L, a b ( a b ) 1, 1 3) Move in direction of steepest +ve gradient i.e. L( a 1, b 1 ) is increasing fastest a b 4) Repeat from step until ( a n, b n ) converges on maximum of likelihood

31 Consider a -D example (e.g. bivariate normal distribution); Likelihood function depends on parameters a and b. Suppose we are trying to find the maximum of L(a,b) L(a,b) 1) Start off at some randomly chosen value ( a 1, b 1 ) ) Compute L( a 1, b 1 ) and gradient L L, a b ( a b ) 1, 1 3) Move in direction of steepest +ve gradient i.e. L( a 1, b 1 ) is increasing fastest a b 4) Repeat from step until ( a n, b n ) converges on maximum of likelihood OK for finding maximum, but not for generating a sample from or for determining errors on the the ML parameter estimates. L(a,b)

32 MCMC provides a simple Metropolis algorithm for generating random samples of points from L(a,b) b Slice through L(a,b) a

33 b MCMC provides a simple Metropolis algorithm for generating random samples of points from L(a,b) 1. Sample random initial point P 1 ( a 1, b 1 ) P 1 Slice through L(a,b) a

34 MCMC provides a simple Metropolis algorithm for generating random samples of points from L(a,b) b 1. Sample random initial point P 1 ( a 1, b 1 ). Centre a new pdf, Q, called the proposal density, on P 1 P 1 Slice through L(a,b) a

35 MCMC provides a simple Metropolis algorithm for generating random samples of points from L(a,b) b 1. Sample random initial point P 1 ( a 1, b 1 ). Centre a new pdf, Q, called the proposal density, on P 1 P 1 P 3. Sample tentative new point from Q P ( a, b ) Slice through L(a,b) a

36 MCMC provides a simple Metropolis algorithm for generating random samples of points from L(a,b) b 1. Sample random initial point P 1 ( a 1, b 1 ). Centre a new pdf, Q, called the proposal density, on P 1 P 1 P 3. Sample tentative new point from Q P ( a, b ) Slice through L(a,b) 4. Compute R L( a', b' ) L ( a 1, b 1) a

37 5. If R > 1 this means is uphill from. P P 1 We accept P as the next point in our chain, i.e. P P 6. If R < 1 this means is downhill from. P P 1 In this case we may reject P as our next point. In fact, we accept with probability R. P How do we do this? (a) Generate a random number x ~ U[0,1] (b) If x < R then accept and set P P P (c) If x > R then reject and set P P P 1

38 5. If R > 1 this means is uphill from. P P 1 We accept P as the next point in our chain, i.e. P P 6. If R < 1 this means is downhill from. P P 1 In this case we may reject P as our next point. In fact, we accept with probability R. P How do we do this? (a) Generate a random number x ~ U[0,1] (b) If x < R then accept and set P P P (c) If x > R then reject and set P P P 1 Acceptance probability depends only on the previous point - Markov Chain

39 So the Metropolis Algorithm generally (but not always) moves uphill, towards the peak of the Likelihood Function. Remarkable facts Sequence of points { P 1, P, P 3, P 4, P 5, } represents a sample from the LF L(a,b) (see notes on website) Sequence for each coordinate, e.g. samples the marginalised likelihood of a { a 1, a, a 3, a 4, a 5, } We can make a histogram of and use it to compute the mean and variance of a ( i.e. to attach an error bar to a ) { a 1, a, a 3, a 4, a 5,, a n }

40 Why is this so useful? Suppose our LF was a 1-D Gaussian. We could estimate the mean and variance quite well from a histogram of e.g samples. But what if our problem is, e.g. 7 dimensional? Exhaustive sampling could require (1000) 7 samples! No. of samples MCMC provides a short-cut. To compute a new point in our Markov Chain we need to compute Sampled value the LF. But the computational cost does not grow so dramatically as we increase the number of dimensions of our problem. This lets us tackle problems that would be impossible by normal sampling.

41 Example: CMBR constraints from WMAP 3 year data ( + 1 year data) Angular power spectrum of CMBR temperature fluctuations ML cosmological model, depending on 7 different parameters. (Hinshaw et al 006)

42 Question 14: When applying the Metropolis algorithm, if the width of the proposal density is very small A B C D the Markov Chain will move around the parameter space very slowly the Markov Chain will converge very quickly to the true pdf the acceptance rate of proposed steps in the Markov Chain will be very small most steps in the Markov Chain will explore regions of very low probability

43 Question 14: When applying the Metropolis algorithm, if the width of the proposal density is very small A B C D the Markov Chain will move around the parameter space very slowly the Markov Chain will converge very quickly to the true pdf the acceptance rate of proposed steps in the Markov Chain will be very small most steps in the Markov Chain will explore regions of very low probability

44 A number of factors can improve the performance of the Metropolis algorithm, including: using parameters in the likelihood function which are (close to) independent (i.e. their Fisher matrix is approx. diagonal). adopting a judicious choice of proposal density, well matched to the shape of the likelihood function. using a simulated annealing approach i.e. sampling from a modified posterior likelihood function of the form T for large the modified likelihood is a flatter version of the true likelihood [ p( θ D I )] ln, p T ( θ D, I ) exp T

45 T Temperature parameter starts out large, so that the acceptance rate for downhill steps is high search is essentially random. (This helps to avoid getting stuck in local maxima) T is gradually reduced as the chain evolves, so that downhill steps become increasingly disfavoured. In some versions, the evolution of is carried out automatically this is known as adaptive simulated annealing. T See, for example, Numerical Recipes Section 10.9, or Gregory Chapter 11, for more details.

46 A related idea is parallel tempering (see e.g. Gregory, Chap 1) β 1 T Series of MCMC chains, with different, set off in parallel, with a certain probability of swapping parameter states between chains. High temperature chains are effective at mapping out the global structure of the likelihood surface. Low temperature chains are effective at mapping out the shape of local likelihood maxima.

47 Example: spectral line fitting, from Section 3. Conventional MCMC MCMC with parallel tempering

48 Example: spectral line fitting, from Section 3. Conventional MCMC MCMC with parallel tempering

49 Approximating the Evidence Evidence p(data θ, M ) p( θ M ) dθ Average likelihood, weighted by prior Calculating the evidence can be computationally very costly (e.g. CMBR spectrum in cosmology) C l How to proceed? Information criteria (Liddle 004, 007). Laplace and Savage-Dickey approximations (Trotta 005) 3. Nested sampling (Skilling 004, 006; ) (Mukherjee et al. 005, 007; Sivia 006)

50 Nested Sampling (Skilling 004, 006; Mukherjee et al 005, 007) Evidence p(data θ, M ) p( θ M ) dθ Key idea: We can rewrite the Evidence as Evidence p(data θ, M )dx where X is a 1-D variable known as the prior mass uniformly distributed on [0,1] Evidence Z 1 L( X )dx 0

51 Skilling (006) Evidence Z 1 L( X )dx 0 Example: -D Likelihood function

52 Example: -D Likelihood function (from Mackay 005) Contours of constant likelihood, L Each contour encloses a different fraction, X, of the area of the square Each point in the plane has an associated value of L and X However, mapping systematically the relationship between L and X everywhere may be computationally very costly

53 However, mapping systematically the relationship between L and X everywhere may be computationally very costly

54 Skilling (006) Approximation procedure

56 Skilling (006)

57 7. Advanced Numerical Methods Part 1: Part : Monte Carlo Methods Fourier Methods

Part : Fourier Methods In many diverse fields physical data is collected or analysed as Fourier components. In this section we briefly discuss the mathematics of Fourier series and Fourier transforms.

58 Part : Fourier Methods In many diverse fields physical data is collected or analysed as Fourier components. In this section we briefly discuss the mathematics of Fourier series and Fourier transforms. 1. Fourier Series (x) Any well-behaved function can be expanded in terms of an infinite sum of sines and cosines. The expansion takes the form: f f ( x) 1 a + a cos nx + b sin nx 0 n n 1 n 1 n Joseph Fourier ( )

59 The Fourier coefficients are given by the formulae: π 1 a 0 f ( x) dx π π a n 1 π π π f ( x) cos nxdx b n 1 π π π f ( x) sin nxdx These formulae follow from the orthogonality properties of sin and cos: π π sin mxsin nxdx πδ mn π π cos mx cos nxdx πδ mn π π sin mx cos nxdx 0

60 Some examples from Mathworld, approximating functions with a finite number of Fourier series terms

The Fourier series can also be written in complex form: f ( x) n A n e inx where π 1 An π f ( x) e π inx dx and recall that e inx e inx cos nx + i sin nx cos nx i sin nx Fourier's Theorem

61 The Fourier series can also be written in complex form: f ( x) n A n e inx where π 1 An π f ( x) e π inx dx and recall that e inx e inx cos nx + i sin nx cos nx i sin nx Fourier's Theorem is not only one of the most beautiful results of modern analysis, but it is said to furnish an indispensable instrument in the treatment of nearly every recondite question in modern physics

62 Fourier Transform: Basic Definition The Fourier transform can be thought of simply as extending the idea of a Fourier series from an infinite sum over discrete, integer Fourier modes to an infinite integral over continuous Fourier modes. Consider, for example, a physical process that is varying in the time domain, i.e. it is described by some function of time. h(t) Alternatively we can describe the physical process in the frequency domain by defining the Fourier Transform function. h (t) H ( f ) H ( f It is useful to think of and as two different representations of the same function; the information they convey about the underlying physical process should be equivalent. )

63 We define the Fourier transform as H ( f ) πif t h( t) e dt and the corresponding inverse Fourier transform as h( t) t H ( f ) e πif df If time is measured in seconds then frequency is measured in cycles per second, or Hertz.

64 In many physical applications it is common to define the frequency domain behaviour of the function in terms of angular frequency This changes the previous relations accordingly: ω π f H ( ω) πiωt h( t) e dt h( t) 1 πiω t H ( ω) e dω π Thus the symmetry of the previous expressions is broken.

65 Fourier Transform: Further properties The FT is a linear operation: (1) the FT of the sum of two functions is equal to the sum of their FTs () the FT of a constant times a function is equal to the constant times the FT of the function. h(t) If the time domain function is a real function, then its FT is complex. However, more generally we can consider the case where is also a complex function i.e. we can write h( t) h ( t) ih ( t) R + I h(t) Real part Imaginary part h(t) may also possess certain symmetries: even function odd function h( t) h( t) h( t) h( t)

66 The following properties then hold: See Numerical Recipes, Section 1.0 Note that in the above table a star (*) denotes the complex conjugate, i.e. if z x + i y then z* x i y

67 For convenience we will denote the FT pair by It is then straightforward to show that h( t) H ( f ) 1 h( at) H ( f a) time scaling a 1 b h( t b) H ( b f ) frequency scaling h( t t 0 ) H ( f ) e π i f t 0 time shifting ( ) π i f t h t e 0 H ( f f 0) frequency scaling

68 Suppose we have two functions and Their convolution is defined as g (t) h(t) ( h) ( t) g g( s) h( t s) ds We can prove the Convolution Theorem ( g h) ( t) G( f ) H ( f ) i.e. the FT of the convolution of the two functions is equal to the product of their individual FTs. Known as the lag Also their correlation, which is also a function of t, is defined as Corr( g, h) g( s + t) h( s) ds

69 We can prove the Correlation Theorem Corr( g, h) G( f ) H ( f ) i.e. the FT of the first time domain function, multiplied by the complex conjugate of the FT of the second time domain function, is equal to the FT of their correlation. The correlation of a function with itself is called the auto-correlation In this case Corr( g, g) G( f ) G( f ) The function is known as the power spectral density, or (more loosely) as the power spectrum. Hence, the power spectrum is equal to the Fourier Transform of the auto-correlation function for the time domain function g(t)

70 The power spectral density The power spectral density is analogous to the pdf we defined in previous sections. In order to know how much power is contained in a given interval of frequency, we need to integrate the power spectral density over that interval. The total power in a signal is the same, regardless of whether we measure it in the time domain or the frequency domain: Total Power h(t) dt - - H(f) df Parseval s Theorem

71 Often we will want to know how much power is contained in a frequency interval without distinguishing between positive and negative values. In this case we define the one-sided power spectral density: P h ( f ) H ( f ) + H ( f ) 0 f < And Total Power 0 P h ( f ) df h(t) When is a real function P h ( f ) H ( f ) With the proper normalisation, the total power (i.e. the integrated area under the relevant curve) is the same regardless of whether we are working with the time domain signal, the power spectral density or the one-sided power spectral density.

72 From Numerical Recipes, Chapter 1.0 Time domain One-sided PSD Two-sided PSD

73 Examples (1) h( t) const. Dirac Delta function H ( f ) δ D(0) h(t) H ( f ) 0 t 0 f () π i f0 t h( t) e H ( f ) δ D( f f0) h(t) H ( f ) 0 t 0 f Imaginary part Real part

74 H ( f ) sinc( π f ) (3) 1 h( t) 0 for 1 t otherwise 1 H ( f ) h(t) 0 f Imaginary part 0 0 t y sincx The sinc function occurs frequently in many areas of physics y 1 sinc x sin x x 1 st zeroat x π 1 st zeroat x π x 0 x ±mπ The function has a maximum at and the zeros occur at for positive integer m x

75 (4) h( t) ( t) H ( f ) sinc ( π f ) H ( f ) 1 1-1/ 0 1/ t 0 f Re [ H ( f )] a a + 4π f (5) h( t) e at Im [ H ( f )] a π f ( a + 4π f ) H ( f ) Real part t Imaginary part

76 (6) h( t) exp( a t ) ( π f / ) H ( f ) exp a 0 t 0 f i.e. the FT of a Gaussian function in the time domain is also a Gaussian in the frequency domain.

77 Question 15: If the variance of a Gaussian is doubled in the time domain A B C D the variance of its Fourier transform will be doubled in the frequency domain the variance of its Fourier transform will be halved in the frequency domain the standard deviation of its Fourier transform will be doubled in the frequency domain the standard deviation of its Fourier transform will be halved in the frequency domain

78 Question 15: If the variance of a Gaussian is doubled in the time domain A B C D the variance of its Fourier transform will be doubled in the frequency domain the variance of its Fourier transform will be halved in the frequency domain the standard deviation of its Fourier transform will be doubled in the frequency domain the standard deviation of its Fourier transform will be halved in the frequency domain

79 (6) h( t) exp( a t ) ( π f / ) H ( f ) exp a 0 t 0 f i.e. the FT of a Gaussian function in the time domain is also a Gaussian in the frequency domain. The broader the Gaussian is in the time domain, then the narrower the Gaussian FT in the frequency domain, and vice versa.

80 Discrete Fourier Transforms Although we have discussed FTs so far in the context of a continuous, h(t) analytic function,, in many practical situations we must work instead with observational data which are sampled at a discrete set of times. h(t) Suppose that we sample in total times at evenly spaced time N intervals, i.e. (for even) N +1 h k h ( t k ) where t k k, k N,...,0,..., N h(t) [ If is non-zero over only a finite interval of time, then we N +1 suppose that the sampled points contain this interval. Or if has an infinite range, then we at least suppose that the sampled points h(t) h(t) cover a sufficient range to be representative of the behaviour of ].

81 We therefore approximate the FT as H ( f ) k N π t h( t) e if dt h k N k e π if t k h(t) N + 1 Since we are sampling at discrete timesteps, in view of the symmetry of the FT and inverse FT it makes sense also to compute N +1 only at a set of discrete frequencies: H ( f ) f n n N, n N,...,0,..., N f c 1 (The frequency is known as the Nyquist (critical) frequency and it is a very important value. We discuss its significance later).

82 Then Note that Hence, there are only independent values. Also, note that So we can re-define the Discrete FT as: n N k N i k n k n H e h f H 1 0 / ) ( π in in e e π π N N k N in N ikn in N ikn e e e )/ ( / / + + π π π π / ) ( N k N k N i k n k N k N k t if k n e h e h f H k n π π Discrete Fourier Transform of the k h

83 The discrete inverse FT, which recovers the set of from the set of H n 's is h k 1 1 N k 0 N H n e π i k n / N h k 's Parseval s theorem for discrete FTs takes the form N 1 k 0 h k 1 1 N N n 0 H n There are also discrete analogues to the convolution and correlation theorems.

84 Fast Fourier Transforms Consider again the formula for the discrete FT. We can write it as H n N 1 k 0 e π i k n / N h k N 1 k 0 W nk h k H n ' s This is a matrix equation: we compute the ( N 1) vector of nk by multiplying the ( N) matrix W by the ( N 1) vector of. N [ ] h k 's In general, this requires of order multiplications (and the may be complex numbers). N h k ' s N N e.g. suppose. Even if a computer can perform (say) 1 billion multiplications per second, it would still require more than 115 days to calculate the FT.

85 Fortunately, there is a way around this problem. Suppose (as we assumed before) is an even number. Then we can write where So we have turned an FT with points into the weighted sum of two FTs with points. This would reduce our computing time by a factor of two. N / 0 1 / 1) ( 1 / 0 / ) ( 1 0 / N j j N n j i N j j N n j i N k k N i k n n h e h e h e H π π π Even values of k Odd values of k / 1 0 / M j j M j n i n M j j M j n i h e W h e π π N / M N N /

86 Why stop there, however?... M If is also even, we can repeat the process and re-write the FTs of M M / length as the weighted sum of two FTs of length. N If is a power of two (e.g. 104, 048, etc) then we can repeat iteratively the process of splitting each longer FT into two FTs half as long. The final step in this iteration consists of computing FTs of length unity: H π i k 0 0 e hk h0 k 0 i.e. the FT of each discretely sampled data value is just the data value itself.

87 O( N This iterative process converts multiplications into O( N log N) ) operations. This notation means of the order of So our operations are reduced to about operations. Instead of 100 days of CPU time, we can perform the FT in less than 3 seconds. The Fast Fourier Transform (FFT) has revolutionised our ability to tackle problems in Fourier analysis on a desktop PC which would otherwise be impractical, even on the largest supercomputers.

88 Data Acquisition Earlier we approximated the continuous function and its FT H ( f ) N +1 h(t) by a finite set of discretely sampled values. How good is this approximation? The answer depends on the form of h(t) H ( f ) and. In this short section we will consider: 1. under what conditions we can reconstruct and exactly from a set of discretely sampled points? h (t) H ( f ). what is the minimum sampling rate (or density, if is a spatially varying function) required to achieve this exact reconstruction? 3. what is the effect on our reconstructed and if our data acquisition does not achieve this minimum sampling rate? h h (t) H ( f )

89 The Nyquist Shannon Sampling Theorem h(t) Suppose the function is bandwidth limited. This means that h(t) the FT of is non-zero over a finite range of frequencies. i.e. there exists a critical frequency f c such that H ( f ) 0 for all f fc The Nyquist Shannon Sampling Theorem (NSST) is a very important result from information theory. It concerns the representation of by a set of discretely sampled values h(t) h k h ( t k ) where t k k, k...,, 1,0,1,,...

90 The NSST states that, provided the sampling interval satisfies 1 f c or less then we can exactly reconstruct the function from the discrete { } h k samples. It can be shown that h(t) h( t) sin [ π fc( t k ) ] π ( t k ) + hk k - f c is also known as the Nyquist frequency and is known as the Nyquist rate. 1 f c

91 We can re-write this as h( t) sin [ π ( t k ) ] [( t k ) ] + hk k - π So the function is the sum of the sampled values by the normalised sampled values. h (t) { }, weighted sinc function, scaled so that its zeroes lie at those h k Normalised sinc function sinc(x) (compare with previous section) x ( t ) k t k h ( t) since sinc ( 0) 1 Note that when then h k Sampled values

92 The NSST is a very powerful result. We can think of the interpolating sinc functions, centred on each sampled point, as filling in the gaps in our data. The remarkable fact is that they h(t) do this job perfectly, provided is bandwidth limited. i.e. the h (t) H ( f ) discrete sampling incurs no loss of information about and. Suppose, for example, that. Then we need only h(t) sample twice every period in order to be able to reconstruct the entire function exactly. h( t) sin ( π f t) c (Note that formally we do need to sample an infinite number of { h } { } discretely spaced values,. If we only sample the k h(t) over a finite time interval, then our interpolated will be approximate). h k

93 y sin ( π f t) c x f c t 1 h(t) f c Sampling at (infinitely many of) the red points is sufficient to reconstruct the function for all values of t, with no loss of information.

94 Aliasing There is a downside, however. h(t) If is not bandwidth limited (or, equivalently, if we don t sample 1 < f c frequently enough i.e. if the sampling rate ) then our h (t) H ( f ) reconstruction of and is badly affected by aliasing. This means that all of the power spectral density which lies outside the f < f < H ( f ) h(t) range is spuriously moved inside that range, so that the FT of will be computed incorrectly from the discretely sampled data. c f c Any frequency component outside the range is falsely translated ( aliased ) into that range. ( f, ) c f c

95 h(t) Consider as shown. h(t) Suppose is zero outside the range T. This means that extends to ±. H ( f ) h(t) sampled at regular intervals The contribution to the true FT from outside the ( ) range 1,1 gets aliased into this range, appearing as a mirror image. Thus, at our computed value of f ± 1 H ( f ) is equal to twice the true value. From Numerical Recipes, Chapter 1.1

96 How do we combat aliasing? f h(t) c f > o Enforce some chosen e.g. by filtering to remove the high frequency components. (Also known as anti-aliasing) h(t) 1 f c o Sample at a high enough rate so that - i.e. at least two samples per cycle of the highest frequency present f c 1 To check for / eliminate aliasing without pre-filtering: o Given a sampling interval, compute f lim 1 o h(t) Check if discrete FT of is approaching zero as f flim o o If not, then frequencies outside the range 1,1 are probably being folded back into this range. Try increasing the sampling rate, and repeat ( )

97 And finally. Mock Data Challenge 1000 (x,y) data pairs, generated from an unknown model plus Gaussian noise. y Data posted on my.supa Four-stage challenge: x 1. Fit a linear model to these data, using ordinary least squares;. Compute Bayesian credible regions for the model parameters, using a bivariate normal model for the likelihood function; 3. Write an MCMC code to sample from the posterior pdf of the model parameters, and compare their sample estimates with the LS fits; 4. Carry out a Bayesian model comparison, calculating the posterior odds ratio of a constant, linear and quadratic model.

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006 Astronomical p( y x, I) p( x, I) p ( x y, I) = p( y, I) Data Analysis I Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK 10 lectures, beginning October 2006 4. Monte Carlo Methods