Dimensionality Reduction Techniques for Modelling Point Spread Functions in Astronomical Images

Size: px

Start display at page:

Download "Dimensionality Reduction Techniques for Modelling Point Spread Functions in Astronomical Images"

Beverly Ellis
5 years ago
Views:

1 Dimensionality Reduction Techniques for Modelling Point Spread Functions in Astronomical Images Aristos Aristodimou E H U N I V E R S I T Y T O H F R G E D I N B U Master of Science School of Informatics University of Edinburgh 2011

2 Abstract Even though 96% of the Universe is consisted of dark matter and dark energy, their nature is unknown since modern physics are not adequate to define their characteristics. One new approach that cosmologists are using, tries to define the dark Universe by precisely measuring the shear effects on galaxy images due to gravitational lensing. Except the shear effect on the galaxies, there is also another factor that causes distortion on the images, called the Point Spread Function (PSF). The PSF is caused by atmospheric conditions, imperfections on the telescopes and the pixelisation of the images when they are digitally stored. This means that before trying to calculate the shear effect, the PSF must be accurately calculated. This dissertation is part of the GREAT10 star challenge, which is on predicting the PSF on non-star position with high accuracy. This work focuses on calculating the PSF at star positions with high accuracy so that these values can later on be used to interpolate the PSF on non-star positions. For the purposes of this dissertation, dimensionality reduction techniques are used to reduce the noise levels in the star images and to accurately capture their PSF. The techniques used are Principal Component Analysis (PCA), Independent Component Analysis (ICA) and kernel PCA. Their reconstructed stars are further processed with the Laplacian of Gaussian edge detection for capturing the boundary of the stars and removing any noise that is outside this boundary. The combination of these techniques had promising results in the specific task and outperformed the baseline approaches that use quadrupole moments. i

3 Acknowledgements I would like to thank my supervisor Dr Amos Storkey for his guidance and for providing this opportunity to work on this interesting project. Also I would like to thank Jonathan Millin for all of his comments and help throughout this project. ii

4 Declaration I declare that this thesis was composed by myself, that the work contained herein is my own except where explicitly stated otherwise in the text, and that this work has not been submitted for any other degree or professional qualification except as specified. (Aristos Aristodimou) iii

5 Contents 1 Introduction Motivation Specific Objectives Scope of Research Contribution Thesis Structure Theoretical Background and Related Work Principle Component Analysis Related Work Conclusion Independent Component Analysis Related Work Conclusion Kernel Principle Component Analysis Related Work Conclusion Laplacian of Gaussian Edge Detection Conclusion Quadrupole Moments Conclusion Data 13 4 Methodology Global Evaluation Framework Local Evaluation Framework Baseline Approach iv

6 4.3.1 Initial baseline approach Improved baseline approach LoG edge detection PCA Component Selection PCA on each set PCA on each model PCA on all of the data ICA Component Selection ICA on each set Selecting the contrast function Kernel PCA Component Selection Kernel PCA on each set Kernel Selection Results RMSE of the noise Initial baseline approach Improved Baseline Approach PCA Component Selection PCA on each set PCA on each model PCA on all of the sets ICA Component Selection Contrast function selection ICA on each set Kernel PCA Component Selection Kernel PCA on each set with a RBF kernel Kernel PCA on each set with a Polynomial kernel Comparison of the methods v

7 6 Conclusion Future work Bibliography 59 vi

8 List of Figures 1.1 The shear and PSF effect on a star and a galaxy The convolution from telescopic and atmospheric effects A star image from each set An example of a Scree plot and the eigenvectors obtained from PCA on a set An example of sorted independent components using negentropy Box-plot of the RMSE on each set using the baseline approach An example of a reconstructed star using the baseline approach Box-plot of the RMSE on each set using the improved baseline approach An example of a reconstructed star using the improved baseline approach Box-plot of the RMSE on each set using PCA on each set An example of a reconstructed star using PCA on each set and the LoG edge detection The patterns of each model Box-plot of the RMSE on each set using PCA on each model An example of a reconstructed star using PCA on each model and the LoG edge detection Box-plot of the RMSE on each set using PCA on all of the sets An example of a reconstructed star using PCA on all of the sets and the LoG edge detection Box-plot of the RMSE on the sets ICA was tested using different contrast functions Box-plot of the RMSE on each set using ICA on each set An example of a reconstructed star using ICA on each set and the LoG edge detection vii

9 5.15 Box-plot of the RMSE on each set using kernel PCA on each set with the RBF kernel An example of a reconstructed star using PCA on each set and the LoG edge detection Box-plot of the RMSE on each set using kernel PCA on each set with the polynomial kernel An example of a reconstructed star using kernel PCA on each set with the RBF kernel and the LoG edge detection viii

10 List of Tables 5.1 The mean RMSE of the noise in each set The mean RMSE on each set using the baseline approach The mean RMSE on each set using the improved baseline approach The range of components tested on each PCA method and the selected number of components The mean RMSE on each set using PCA on each set The mean RMSE on each set using PCA on each model The mean RMSE on each set using PCA on all of the sets The number of components used with each ICA contrast function The mean RMSE on the sets ICA was tested with each contrast function The mean RMSE on each set using ICA on each set The range of components tested on each Kernel PCA method and the selected number of components The mean RMSE on each set using kernel PCA on each set with the RBF kernel The mean RMSE on each set using kernel PCA on each set with the polynomial kernel ix

11 Chapter 1 Introduction 1.1 Motivation The Universe consists of physical matter and energy, the planets, stars, galaxies, and the contents of the intergalactic space. The biggest part of the Universe is dark matter and dark energy whose nature has not yet been fully defined. Because modern physics are not capable of defining their characteristics, new methods had to be developed. One promising approach uses the shape distortion of galaxies which is caused by gravitational lensing [24]. Gravitational lensing is the effect that light rays are deflected by gravity. Because there is mass between the galaxies and the observer, images of galaxies get distorted. This can cause a shear on the galaxy which is a further small ellipticity on its shape [20]. The cosmologists, by making assumptions of the original shape of the galaxy, can infer information about the dark matter and dark energy that is between galaxies and the observer [4]. Except the shear effect, the images are also distorted by a convolution kernel. This convolution kernel or Point Spread Function (PSF) is caused by a combination of factors. The first is the refraction of the photons when they travel through our atmosphere. Then due to slight movements of the telescope or even because the mirrors and lenses of the telescopes are imperfect and the weight of the mirror warps the mirror differently at different angles, the image can gets further distortions. Finally, because the images are digitally stored, there is also a pixelisation which removes some of the detail of the stars and galaxies and also adds noise to the final image. [20] Unlike galaxies, stars do not have the shear effect because they are point like objects. Since stars get distorted only by the PSF, computing the local PSF at a star is easier than computing it on galaxies. What is needed is to infer the spatially carrying 1

Chapter 1. Introduction 2 PSF at non-star positions using the PSF estimations we have at star positions. By this we can infer the PSF at a galaxy using the PSF of the stars in its image.

12 Chapter 1. Introduction 2 PSF at non-star positions using the PSF estimations we have at star positions. By this we can infer the PSF at a galaxy using the PSF of the stars in its image. Then the galaxy can be deconvolved using the PSF so that the gravitational lensing can be better estimated. An example of the distortion of the stars and galaxies is shown in figure 1.1. In our data we also expect atmospheric distortions that are caused when the telescopes are not in space as shown in figure 1.1 but ground-based. Figure 1.1: The shear effect and PSF effect on stars and galaxies. On the top pictures a star is seen from a telescope and due to telescope effects the star is blurred and then due to the detector a pixelated image is created. For the galaxies there is an additional shear effect caused from the mass between the galaxy and the observer and then the PSF effect.[20] 1.2 Specific Objectives This dissertation is part of the GREAT10 star challenge, which is on predicting the PSF on non-star positions with high accuracy. The interpolation of the PSFs is part of the dissertations of other students who are participating in the project. This work focuses on capturing the PSF at star positions with high accuracy. Specifically it focuses on clearing the star images from the noise so that the star with its PSF can be captured. This is done using different dimensionality reduction techniques, that help reduce the noise and also make things easier for the interpolation part. This is because lower dimensional data need to be interpolated rather than the whole star images. This means that the optimal number of lower dimensions needs to be defined and then the data need to be reconstructed so that as much of the noise is removed while the PSF of the star is unaffected. Due to prediction errors and because the initial components that are used

13 Chapter 1. Introduction 3 will contain some noise, further noise removal is needed, hence an additional noise removal technique needs to be used. 1.3 Scope of Research There are two main ways of finding the PSF, direct modelling and indirect modelling. Direct modelling uses model fitting whereas indirect modelling uses dimensionality reduction techniques. For the needs of this dissertation, dimensionality reduction techniques will be used. The assumption is that this approach results in more general solutions that can be better applied on different data sets. This means that the results that will be obtained from the artificial data will be close to the results that we would get using real data. By having more general solutions it also means that no further adjustments to the technique will be needed when it will be used on real data. In specific, the dimensionality reduction techniques that will be used are PCA, ICA and kernel PCA. They will be used on the stars directly as in [18] and then the reconstructed image from the lower components will have a further noise removal using the laplacian of Gaussian edge detection. PCA has already been used in different ways in this area with good results so it should have good results in these data as well. ICA assumes that the components we are looking for are statistically independent in our non-gaussian source [15]. What we have in the star images is a PSF that is caused by several factors but in the pictures we see all of those factors as a distortion on the star. If these factors are statistically independent it makes ICA suitable for this problem. Moreover the fact that the components are independent might help the interpolation techniques since they will not have to model the dependencies in the components. Both of these methods assume that the data lie on a linear subspace which means that they will not perform well if this is not the case. Kernel PCA is a non-linear dimensionality reduction technique which will be able to capture any non-linearities in our data. It has also been proposed in [20] for this task and has been used for image de-noising in [25]. Another non-linear approach that could be used is Gaussian Process Latent Variable Models (GP-LVM), but due to time constraints and the size of the data we have, it was not used after all. Other techniques like ISOMAP and LLE were considered as well, but there was no clear way of reconstructing the image from the lower dimensions.

14 Chapter 1. Introduction Contribution As stated earlier, for cosmologists to have good results it is important for them to have an accurate PSF of the galaxies they are analysing, which is the aim of this project. By defining the dark Universe with high precision, cosmologists will be able to measure the expansion rate of the universe, and distinguish between modified gravity and dark energy explanations for the acceleration of the universe [12]. It will also mark a revolution in physics, impacting particle physics and cosmology and will require new physics beyond the standard model of particle physics, general relativity or both. Moreover, this thesis tests the quality of dimensionality reduction techniques that have not been used for this task, like ICA and Kernel PCA. Hence the quality of these new approaches are also presented. The use of edge detection is also tested for further noise removal on the reconstructed stars from their lower dimensions. This technique can be used as a post processing step of any other denoising technique for this task. 1.5 Thesis Structure In Chapter 2 the theoretical background of the techniques used and any related work is provided, whereas Chapter 3 describes the data used. The next chapter is the methodology chapter and explains the way each technique is used to obtain the results on our data. Then in Chapter 5 the results of each method are reported and discussed. Finally Chapter 6 provides the conclusion of this dissertation with additional future work plans.

15 Chapter 2 Theoretical Background and Related Work This chapter provides the theoretical background that is needed for this dissertation. For each technique that is used, its theory and any related work on identifying the true PSF of stars in astronomical images is provided. Also a conclusion for each technique is given that presents the reasons for selecting it for this task. 2.1 Principle Component Analysis PCA is a popular technique for multivariate statistical analysis and finds application in many scientific fields [1]. The main goal of PCA is to reduce the dimensionality of the data from N dimensions to M. This is obtained by transforming the interrelated variables of the data set to uncorrelated variables, the principal components, in a way that as much of the variation of the data set is retained. The principal components are ordered in a way that the variation explained by each of them is in descending order [19]. This means that the first principal components explain most of the variation in the data and by using those we can project the data in lower dimensions. It has been shown that the principal components can be computed by finding the eigenvalues-eigenvectors of the covariance matrix of the data [19]. Once the eigenvalues-eigenvectors are calculated, the eigenvectors are sorted in descending order based on their eigenvalues and then the first M eigenvectors can be used to project the data in lower dimensions. It is also possible to only calculate the M eigenvalues-eigenvectors using techniques such as the power method [9]. The original data can be reconstructed from the lower dimensional projections and by using all of the principal components the reconstructed data 5

16 Chapter 2. Theoretical Background and Related Work 6 will be the same with the initial data set. Algorithmically PCA on a N-dimensional data set X is as follows: 1. Compute the mean and covariance of X 2. Compute the covariance matrix of X S = 1 N 1 m = 1 N N i=1 N i=1 x i (2.1) (x i m)(x i m) T (2.2) 3. Compute the M eigenvectors e 1,...,e m with the largest eigenvalues of S and create the matrix E = [e 1,...e m ]. 4. Project each data point x i to its lower dimensional representation y i = E T (x i m) (2.3) 5. If the reconstruction of the original data point x i is needed x i m + Ey i (2.4) Related Work PCA has been used on this task, either directly on the stars [18] or on the estimated PSF of the stars [30, 17, 26]. In [17] a polynomial fit is first done on the stars and then PCA is used on those fits. The components of the PCA are later on used for interpolation. The disadvantage is that this technique depends on the polynomial fit which as mentioned in [20] it has reduced stability at field edges and corners as the fits become poorly constrained. As with [17], techniques that try to capture the PSF and then use PCA to get the components for the interpolation, are dependent on the PSF fit, which usually is affected by the noise. The authors of [18] use PCA directly on the stars and by using a lower number of components they reconstruct the image with lower noise. Then a Lanczos3 drizzling kernel is used to correct geometric distortions. This approach had better results than wavelet and shapelet techniques but Lanczos3 kernel in some cases produced cosmetic artifacts Conclusion PCA has the advantage of being a powerful and easy to implement technique. Furthermore it has a unique solution and the principal components are ordered, which makes

17 Chapter 2. Theoretical Background and Related Work 7 the selection of the components easier. The main disadvantage is that it makes the assumption that data lie on a linear subspace, which is not always the case. PCA has already been used on the specific task that this thesis is about and had some good results, which makes it an appropriate technique to be used. What looks promising from the previous work is [18], because it can be seen as a framework that can be used with different dimensionality reduction techniques and different methods for removing the remaining noise. 2.2 Independent Component Analysis ICA is a non-gaussian latent variable model that can be used for blind source separation. We can see the observed data as being a mixture of independent components, which leads to the task of finding a way of separating the mixed signals. The ICA model can be written as x = As (2.5) where x is the observed data, A is the mixing matrix and s are the original sources (independent components). In this model only x is observed whereas A is unknown and s are the latent variables. To estimate A and s from x, we make the assumption that the components we are looking for are statistically independent and non-gaussian [15]. Once the mixing matrix is estimated, its inverse W is computed so that the independent components can be calculated using s = Wx (2.6) In [13] a fast and robust method of calculating the independent components is proposed. This method is known as fast-ica and is based on maximizing the negative entropy of the independent components. By normalizing the differential entropy H of a random vector y = (y 1,...,y n ) that has a density f (.) they obtain the negentropy J which can be used as a nongaussianity measure [8]. H(y) = f (y) log(y)dy (2.7) J(y) = H(y gauss ) H(y) (2.8) where y gauss is a random Gaussian vector that has the same covariance as y.

18 Chapter 2. Theoretical Background and Related Work 8 Mutual Information is used for measuring the dependence between random variables and can be expressed using negentropy [8] as : I(y 1,...,y n ) = J(y) J(y i ) (2.9) i From (2.9) it is easy to see that by maximizing the negentropy the independent components get as independent as possible so the task now is maximizing this value. To approximate the negentropy the following equation is used: J(y i ) = c[eg(y i ) EG(v)] 2 (2.10) where G is a non-quadratic function (contrast function), v is a standardized Gaussian variable and c is a constant. The contrast functions proposed are: g 1 (u) = tanh(a 1 u) (2.11) g 2 (u) = uexp( a 2 u 2 /2) (2.12) g 3 (u) = u 3 (2.13) The advantage of this method is that it works with any of these contrast functions regardless of the distribution of the independent components is. Moreover, by using a fixed point algorithm the method converges fast and no step size parameters are used [13] Related Work There is no related work on the PSF identification using ICA, but it has been previously used in image analysis. Specifically in [35] it was used in hyperspectral analysis for endmember extraction. In this paper ICA was compared to PCA on the task of endmember extraction and had better results. Also [14] uses ICA to model images that are noise-free but have the same distribution with the sources of the noisy images. Then the noisy image is denoised using a maximum likelihood estimation of an ICA model with noise. The disadvantage is that noiseless images are needed as training sets. It has also been used in signal processing for clearing a signal from noise. An example is [22] where ICA is used to remove artifacts from the observed electroencephalographic signal Conclusion Even though it has not been widely used in image processing, ICA performs really well in source separation tasks. One of its disadvantages is that the components obtained are

19 Chapter 2. Theoretical Background and Related Work 9 not ordered, which makes the component selection a bit harder. Another disadvantage is that as with PCA it is a linear dimensionality reduction technique. The reason that this method will be used is that our stars are convolved by various factors. If these factors are considered as statistically independent and nongaussian, then ICA will be able to separate these factors and by removing the components that represent the noise in the image, we can get a noiseless image. 2.3 Kernel Principle Component Analysis This is a non linear version of the PCA which is achieved with the use of kernel functions. The data are first mapped in a feature space F using a non linear function Φ and then PCA is performed on the mapped data [29]. If our data are mapped in the feature space and we have (Φ 1 ),...,(Φ N ) then PCA will be on the covariance matrix C C = 1 N This means that λv = CV is now transformed to N Φ(x j )Φ(x j ) T (2.14) j=1 λ(φ(x k )V) = Φ(x k ) CV (2.15) with If a NxN matrix K is defined as V = N a k Φ(x k ) (2.16) k=1 then (2.14) and (2.16) can be substituted in (2.15) giving K i, j = Φ(x i )Φ(x j ) (2.17) NλKa = K 2 a (2.18) which means that the solutions can be obtained by solving the eigenvalue problem of Nλa = Ka (2.19) The solutions of a k are normalized and the components are extracted by calculating the projections of Φ(x) onto the eigenvectors V k in feature space F using (V k Φ(x)) = N i=1 a k i (Φ(x i )Φ(x)) (2.20)

20 Chapter 2. Theoretical Background and Related Work 10 Because Φ(x i ) in (2.17) and (2.20) is not required in any explicit form but only in dot product, the dot products can be calculated using kernel functions and without mapping them with Φ [2, 3]. Some kernels that can be used with kernel PCA [29] are the polynomial and the radial basis functions: k(x,y) = (x y d ) (2.21) x y 2 k(x,y) = exp( 2σ 2 ) (2.22) Related Work As with ICA, there is no related work on the PSF in astronomical images, but it has been proposed in [20] for this task because of its non linearity. In [25] and [21], the problem of reconstructing an image from the components is addressed. Also the reconstruction of the image from lower components is being used for noise removal. In experiments they made, these techniques outperform PCA on the specific task. Specifically [21] has better results than [25] and has also the advantage of being non iterative and not suffering from local minima. A hybrid approach was later proposed in [31] which uses [21] to get a starting point for [25] and it has even better results Conclusion The main advantage of Kernel PCA is that it overcomes the problem of the linearity assumption that PCA and ICA have. Moreover it has been successfully used in image denoising and was shown to have better results than PCA. These facts and the fact that it has been proposed in [20], are the main reasons for using it. The disadvantage is that the reconstruction from the lower dimensions to the initial dimensions is harder than PCA, but there are techniques for accomplishing that. 2.4 Laplacian of Gaussian Edge Detection Edges are important changes in an image since they usually occur on the boundary of different objects in the image [16]. For example in the star images that we have, the edge might be the stars boundary against the black sky. Edge detection is done by the use of the first derivative or second derivative operators. First derivative operators like Prewitt s operator [27] and Sobel s operator [10] compute the first derivative and use a threshold to choose the edges in the image. This threshold may vary in different images

21 Chapter 2. Theoretical Background and Related Work 11 and noise levels. Second order derivative techniques only select points that have local maxima by finding the zero crossings of the second derivative of the image [16]. The disadvantage of second derivative operators is that they are even more susceptible to noise than first derivative operators. The Laplacian of Gaussian (LoG) edge detection technique proposed by Marr and Hildreth [23] first use a Gaussian smoothing filter to reduce the noise and then use a second derivative operator for the edge detection. The operator used for calculating the second derivative of the filtered image is the Laplacian operator, which for a function f(x,y) is 2 f = 2 f x f y 2 (2.23) The LoG operator s output is estimated using the following convolution operation h(x,y) = 2 [g(x,y) f (x,y)] (2.24) and by the derivative rule for convolution we have h(x,y) = [ 2 g(x,y) ] f (x,y) (2.25) where ( x y 2 2σ 2 ) g(x,y) = e (x2 +y 2 ) 2σ 2 (2.26) σ 4 The LoG edge detection has some good properties [11]. It can be applied in different scales so we do not need to know in advance the scale of the interesting features. Also it is separable and rotation invariant. On the other hand it might detect phantom edges but this is a general problem with edge detection and post processing techniques have been introduced to fix these types of problems [7, 6] Conclusion No PSF specific related work was done using the LoG technique to be mentioned. There is a paper in cosmic ray rejection that uses some properties of the LoG but it is used as a classifier of cosmic rays [34]. The fact that LoG is applicable in different scales is an important feature, since convolved stars may vary in size and shapes. Also the fact that it does some images smoothing with a Gaussian filter before actually detecting the edges might be helpful in our images since they are noisy. This technique is worth using after reconstructing a star from its lower components for identifying the star and removing any remaining noise that is not part of the star.

22 Chapter 2. Theoretical Background and Related Work Quadrupole Moments The ellipticity of a star can be measured using the quadrupole moments, but this method works in the absence of pixelisation, convolution and noise [5]. Initially the first moments are used to define the centre of the images brightness: I(x,y)xdxdy x = (2.27) I(x,y)dxdy I(x,y)ydxdy ȳ = (2.28) I(x,y)dxdy where I(x,y) is the intensity of the pixel at coordinates x,y. Then the quadrupole moments can be calculated Q xx = Q xy = Q yy = and the overall ellipticity of a star can be defined as ε ε 1 + ε 2 = I(x,y)(x x)(x x)dxdy I(x,y)dxdy (2.29) I(x,y)(x x)(y ȳ)dxdy I(x,y)dxdy (2.30) I(x,y)(y ȳ)(y ȳ)dxdy I(x,y)dxdy (2.31) Q xx Q yy + 2iQ xy Q xx + Q yy + 2(Q xx Q yy Q xy 2) 1/2 (2.32) If we have an elliptical star with a major axis a and minor axis b and the angle between the positive x axis and the major axis was θ then [5] Conclusion ε 1 = a b cos(2θ) a + b (2.33) ε 2 = a b sin(2θ) a + b (2.34) The quadrupole moments do not take into consideration the noise and pixelisation so they will not work well on the initial images. Once the noise is removed from the image, they can be used as the covariance matrix of a Gaussian, centred on the location of the star in the image. By this a star can be recreated using the quadrupole moments. The covariance matrix S will be S = ( Qxx Q xy Q xy Q yy ) (2.35)

Chapter 3 Data The data used are from the GREAT10 star challenge [20]. The data are artificially generated illustrating the real PSF effects on star images.

23 Chapter 3 Data The data used are from the GREAT10 star challenge [20]. The data are artificially generated illustrating the real PSF effects on star images. The PSF is caused by atmospheric and telescopic effects and further noise and pixelisation is added to the image due to the detectors. An example of the atmospheric and telescopic effects can be seen in figure 3.1. Figure 3.1: The upper panel shows the real point like stars and the resulted observed stars due to the atmospheric and telescopic effects. The lower panel shows the atmospheric convolution on the left, and the telescopic convolution on the right. The atmospheric convolution has random, coherent patterns, whereas the telescopic convolution has specific functional behaviour due to optical effects The data set is approximately 50GB and contains 26 sets with 50 images in each 13

24 Chapter 3. Data 14 set. Each image has 500 to 2000 stars depending on the set it belongs to. There are approximately 1.3 million stars to be analysed and each star is in a 48x48 pixel patch in the image. To reduce the size of the files more, the stars were extracted from the images and the patches size was reduced to 30x30 pixel. This reduced the size of the data set to approximately 9Gb so that each set can be processed by a typical 64bit personal computer. To artificially create the star data, a PSF convolution is done on point like stars. In each set, the images were created with the same underlying PSF, but they illustrate different gravitational lensing effects by using different random components. Furthermore, the PSF varies spatially across an image so that stars in the same image have different convolutions. After the PSF convolution, the pixelisation effect is created by summing the star intensities in square pixels. Finally, noise is added to the images. The noise added is uncorrelated Gaussian and the image simulation process also adds Poisson noise. An example of a star from each set is shown in figure 3.2. Specifically, this is the first star encountered in the first image of each set. As can be seen, the PSF is varying giving different convolution to the stars in each set. This is affecting the elliptical shape and size of the observed stars. For example in sets 6, 14 and 26 the observed stars are much smaller, whereas is sets 7 and 15 the observed stars are larger. Because of this variation of shapes and sizes, the techniques used must be able to take them into account to have good results.

25 Chapter 3. Data 15 Figure 3.2: The first star encountered from the top left corner in the first image of each set

26 Chapter 4 Methodology This chapter presents the way each method is used and the way the experiments are performed. First a global evaluation framework is provided for evaluating the different dimensionality reduction techniques using the competition s evaluation method. Also a local evaluation framework is shown that can be used for locally optimizing the techniques before using the global evaluation framework. Then a baseline approach and its evaluation is presented. Finally the algorithms for optimizing and running the different techniques are provided. 4.1 Global Evaluation Framework The main purpose of this dissertation is to use different dimensionality reduction techniques in order to capture the PSF of the stars. This can be done by projecting the data into lower components that are explaining the structure of the PSF and not the noise. This means that the reconstructed stars from their lower components will have less noise. These lower dimensional data can be used for interpolating the PSF at nonstar positions in the images. This makes the interpolation easier, since now it will not be necessary to interpolate the whole star image. Once the predicted components are estimated, the stars with their PSF can be reconstructed. Due to prediction errors and because the initial components that are used for training will contain some noise, further noise removal is needed. For this task, LoG edge detection will be used. Edge detection will capture the boundary of a star and therefore, any noise that is outside the boundary of a star can be removed. The final star images can be evaluated by uploading them to the GREAT10 star challenge website, which will provide a quality factor for the submitted data. All of the above provide the following global framework for 16

27 Chapter 4. Methodology 17 evaluating the results of different dimensionality reduction techniques. Global Evaluation Framework 1. Use a dimensionality reduction technique to project the training data to lower dimensions 2. Provide the lower dimensional data for interpolation on the asked non-star positions 3. Reconstruct the stars from the predicted values 4. Use LoG edge detection to remove any noise outside the stars boundaries 5. Submit the final stars to the competition s website for evaluation This framework will be used once the values at the non-star positions are predicted using the data provided by this dissertation. Because this means that work from different dissertations needs to be combined, it will be used in a future stage. 4.2 Local Evaluation Framework Because the data to be submitted will be gigabytes in size and only one submission per day is allowed, the global evaluation framework is to be used when all of the local optimizations of the methods are done. Since the global evaluation framework is not suitable for optimizing the methods used, a local evaluation framework is needed. This means that the methods can be tested only using the noisy star images. First, the data are projected to their lower dimensions using a dimensionality reduction technique. Then the stars are reconstructed from the lower dimensions and any remaining noise is removed using LoG edge detection. To evaluate the final star images, the root mean square error (RMSE) is calculated between the final star images and the original noisy star images. The RMSE for two vectors x 1 and x 2 is n (x 1,i x 2,i ) 2 i=1 RMSE = n (4.1) Because the star pixels usually have larger intensities than the pixels that just have noise, this evaluation can tell us how good the noise removal was. If the noise removal is perfect, then the RMSE will account only for the noisy pixels. Since the noise intensities are small the RMSE value will be small. In case we were not able to reconstruct

28 Chapter 4. Methodology 18 the star with its true PSF, then the error will be larger. For example if the star was elliptical but instead a spherical star was created, then noise pixels will be tested against star pixels and vice versa, hence the RMSE will be larger. If the RMSE is 0 then it means that no change to the noisy image was done,therefore the technique failed. The local evaluation framework is the following. Local Evaluation Framework 1. Use a dimensionality reduction technique to project the training data to lower dimensions (different parameters can be used to optimize the technique) 2. Reconstruct the stars from their lower dimensions 3. Use LoG edge detection to remove any noise outside the stars boundaries 4. Calculate the error between the noisy and final star image using (4.1) This framework will be used for testing different number of components on different dimensionality reduction techniques using different contrast functions or kernels. Because this will require a dissent amount of time, the evaluation is done on a subset of the data (approximately 10%). Specifically it is using 100 randomly chosen stars from each image of each set, which were chosen once and used for all of the optimizations of the dimensionality reduction techniques. When the optimizations are done for a technique, then the local evaluation technique is run on all of the data so that a comparison between different techniques can be made. 4.3 Baseline Approach Initial baseline approach As baseline, the ellipticity of the stars is calculated using the quadrupole moments ( ). The quadrupole moments are calculated using the noisy star images. Then each star is recreated using a Gaussian with a covariance function like (2.35) and centred at the stars location. The algorithm of the baseline approach is the following:

29 Chapter 4. Methodology 19 Algorithm 4.1 Baseline approach 1. Calculate the centre of the image brightness using (2.27) and (2.28) 2. Calculate the quadrupole moments of a star using ( ) 3. Recreate a star using a Gaussian with a covariance matrix as (2.35) and centred at the star location 4. Calculate the RMSE between the noisy star and the recreated star using (4.1) Improved baseline approach The initial baseline approach can be further improved by trying to remove the noise with dimensionality reduction techniques. Specifically, a preprocessing of the data using PCA is done to see how much the baseline approach is improve when noise is removed. These results are then compared with the initial baseline approach and the local evaluation framework that uses edge detection instead of quadrupole moments. The problem encountered with the quadrupole moments approach, is that there are cases that their values cannot be used as a covariance of a Gaussian. This means that there are cases where stars cannot be reconstructed. In these cases, in the improved baseline approach the preprocessed star with PCA was used as the reconstructed star instead. In the initial baseline approach those stars had to be ignored. 4.4 LoG edge detection The LoG edge detection is provided as a built in function in MATLAB and it is used for the purposes of this dissertation. The function is used with a threshold equal to zero so that all zero crossing are marked as edges, which results in returning edges that are in closed contour form. What is actually returned by the function is a matrix of the same size of the input matrix, which in this case is a 30x30 matrix. The returned matrix has all its values set to 0 except the pixels that denote the edges, which are set to 1. There are cases where some of the noise outside the boundary of the star is set as extra edge. Because these pixels are outside the boundary of the star they can be ignored by using the first contour that we encounter from the central pixel of the image. To remove any noise outside the boundary of the star the pixels inside the boundary are set to 1, whereas the rest are set to 0. This matrix can be used as a mask for removing the remaining noise. If the star image was X and the mask is Y then the cleaned star

30 Chapter 4. Methodology 20 image C is obtained using C i, j = X i, j Y i, j (4.2) The algorithm for removing the noise outside the boundary of the star is the following: Algorithm 4.2 Remove the remaining noise of the reconstructed star 1. Y = edge(x, log,0) 2. From centre pixel of Y move upwards until a pixel with the value 1 is found 3. Mark pixel as visited 4. Check clockwise from that pixel for a pixel with the value 1 that is not visited 5. Repeat from step 3 until all neighbouring pixels with value 1 are visited. 6. Reset all the pixels of Y to zero except the ones that are marked as visited 7. Set all pixels inside this new boundary in Y to 1 8. Clear the remaining noise using (4.2) 4.5 PCA This is the first method to be used and will provide data for other students who are working with the interpolation task of this project. Because different approaches will be used for the interpolation, some students might use stars from the images of all the sets as their training sets whereas others will use star images from one set at a time or even stars from a single image. This means that each approach will need its training data to be in the same dimensionality. For example, if only stars from a single image are used, then each image might have a different number of principal components, but the stars in an image must all be reduced to the same dimensionality. On the other hand, if all of the images from all the sets are used as training data, then all of the stars must be reduced to the same dimensionality. A single representation could be used by using all of the data as training set so that all of the stars are reduced to the same dimensionality, but as will be seen in the results chapter, the number of principal components needed is much greater. This would make the interpolation slower for approaches were all of these information is not needed. Taking the above into consideration, it was decided to create a representation on each set separately, a representation on all of the sets and if any patterns are noticed on the principal components on each set, make a representation on sets with the same patterns.

31 Chapter 4. Methodology Component Selection The number of components to be used affects the quality of the recreated star images, hence the quality of PCA is affected. To get the best possible number of components, a range of different number of components is tested. That is because there is no clear way of selecting the number of components without any uncertainty. The lower bound of the range is the number of components that we get using the Scree test and the upper bound is the number of components that visually have an apparent structure and are not overfitting the data. To check whether the components to be tested are overfitting the data or not, the structural information that is known for each star patch is used. The information is that the corners of each star patch are just noise. All the noise terms are combined together for all the stars in the image and their variance is estimated using [ Var(X) = E (X µ) 2] (4.3) Then the assumption that variance is stationary across the image is made. Once the data are reconstructed from their lower dimensions, the reconstructed image is subtracted from the original. If the components are not overfitting, then the residual will contain at least the noise and maybe some star structure as well. In this case, the variance of the residual will be larger or equal to the variance of the noise. If this is not the case, then the components contain noise and are overfitting. This is checked for a range of number of components, which has as a lower bound the number of components obtained from the Scree test and as an upper bound the number of components that visually have apparent structure. In figure 4.1 an example of a Scree plot and the eigenvectors obtained with PCA on set 1 of the data is shown. From the Scree plot it is obvious that there is an elbow at component 6, since after that point the line levels off. From the eigenvectors images, it can be seen that eigenvectors 1 to 14 have some structure, whereas after eigenvector 14 the structure is lost. We assume that in this case any eigenvector after the 14th captures the noise and not the stars. Hence, the components to be checked for overfitting are components 6 to 14 using the variance of the noise. Once the highest number of non overfitting components is calculated, it will be used as the upper bound of number of components that PCA will be tested using the local evaluation framework. The number of components with the lowest RMSE will be selected as the number of components to be used from projecting that set to lower dimensions.

Chapter 4. Methodology 22 Figure 4.1: In a) the Scree plot has an elbow at 6 components and b) shows that after eigenvector 14 there is no apparant structure.

32 Chapter 4. Methodology 22 Figure 4.1: In a) the Scree plot has an elbow at 6 components and b) shows that after eigenvector 14 there is no apparant structure. This provides us with the following algorithm for getting the range of components that will be tested with PCA: Algorithm 4.3 Selecting the range of components to be tested 1. Run PCA to obtain the eigenvectors-eigenvalues 2. Plot the eigenvalues of each component and select the lower bound as the number of components at which an elbow is created 3. Visualise the eigvenvectors and select the initial upper bound as the number of components that explain the structure of the stars 4. Compute the noise variance of the star images in a set using (4.3) on the four corners of each star patch 5. Use algorithm 4.4 for the range of components defined in step 2 and 3 to project the data to lower dimensions 6. Reconstruct the stars using (2.4) 7. Calculate the residual between the noisy star images and the recreated stars 8. Calculate the variance of the residual using (4.3) 9. Select as upper bound the highest number of components whose variance is larger than the noise variance

33 Chapter 4. Methodology PCA on each set Since only a single set will be loaded every time in memory for this task, no further changes need to be done on the standard PCA. As the stars are in 30x30 pixel patches, they have to be converted to vectors so that the training data are a 900xK dimensions, where K is the number of stars in the set. This will be referred to as the vectorisation of the data. To project the data to lower dimensions, the following algorithm is run on each set: Algorithm 4.4 PCA on each set 1. Vectorise the set 2. Get the mean m using (2.1) 3. Calculate the covariance matrix S using (2.2) 4. Perform the eigenvector-eigenvalue decomposition 5. Sort the eigenvectors in descending order based on their eigenvalues 6. Select the number of components to be used 7. Project the data to lower dimensions using (2.3) This algorithm is run on the different number of components that are obtained using algorithm 4.3 for optimization. For each lower dimensional representation that is obtained, the local evaluation framework is used on a subset of stars of that set to get the RMSE using that number of components. The reconstruction of the data from their lower dimensions is done using (2.4) PCA on each model For doing PCA on sets with similar patterns, first the eigenvectors obtained from doing PCA on each set need to be examined by hand. Any sets with similar patterns need to be combined together to be used as single training set for PCA. This is because these sets will be considered to be created using a similar model, which might represent a certain type of telescopic or atmospheric effects. Due to this, the lower dimensional data obtained can be used by the students who are taking into consideration these effects when doing interpolation. Only one set can be loaded in memory, due to memory limitations of the computers available, hence a different approach needs to be used. Specifically, instead of using all of the sets of a model for calculating the covariance matrix, the average covariance matrix can be used. What is needed is to calculate

34 Chapter 4. Methodology 24 the covariance matrix and mean of these sets and then use the average of them for the eigenvector-eigenvalue decomposition and for the reconstruction from the lower dimensions. This leads to the following algorithm: Algorithm 4.5 PCA on each model 1.1 for each set i that belongs to a certain model 1.2 Vectorise the set 1.3 Get the mean m i using (2.1) 1.4 Calculate the covariance matrix S i using (2.2) 1.5 end 2. Get the average covariance matrix S and the average mean m from all S i and m i respectively 3. Perform the eigenvector-eigenvalue decomposition on S 4. Sort the eigenvectors in descending order based on their eigenvalues 5. Select the number of components to be used 6. Project the data to lower dimensions using (2.3) with m For selecting the optimal number of components in this case, the steps for evaluating the different number of components are the same as in PCA on each set. The difference is that now the number of components selected must have the lowest mean RMSE on all of the sets of the model that is being tested. Now consider the case where a set needs at least 20 components to recreate the stars without loosing their original ellipticity. If another set that belongs to this same model has as an upper bound on the range of components to be selected was 15 components, then this number of components cannot be selected for all of the sets of that model. If the highest upper bound of all the sets is used as the upper bound for all the sets of that model then this problem is solved. The disadvantage is that any set that had less components as its upper bound will be slightly overfitted to the data, but this is preferred than having sets whose recreated stars are wrong. Because pixel intensities vary from set to set, the error will vary as well, so using the mean directly is not the best choice. What is done instead, is to divide the RMSEs of each set with the maximum RMSE of that set. For example, if in set 1 the maximum RMSE was with 10 components then the RMSE obtained by using different number of components on set 1, will be divided by the RMSE of the 10 components. This way, the number of components that gave the maximum RMSE will be equal to 1 and the rest will be less than 1 according to how much smaller their RMSE was.

35 Chapter 4. Methodology 25 Now the mean of these values can be used and the number of components that has the smallest mean RMSE is selected as the optimal number of components for that model. For the reconstruction of the data from their lower dimensions (2.4) is used, where m is the average mean of the model obtained from algorithm PCA on all of the data This task is the same as doing PCA on each model but in this case all of the sets belong to the same model. So the algorithm is: Algorithm 4.6 PCA on all of the data 1.1 for each set i 1.2 Vectorise the set 1.3 Get the mean m i using (2.1) 1.4 Calculate the covariance matrix S i using (2.2) 1.5 end 2. Get the average covariance matrix S and the average mean m from all S i and m i respectively 3. Perform the eigenvector-eigenvalue decomposition on S 4. Sort the eigenvectors in descending order based on their eigenvalues 5. Select the number of components to be used 6. Project the data to lower dimensions using (2.3) with m To select the optimal number of components the same procedure with the PCA on each model is used, with the difference that now the reconstruction uses the average mean of all the sets. 4.6 ICA The denoising of the stars is done using the Denoising Source Seperation (DSS) toolbox for MATLAB proposed in [28], which provides a framework for applying different denoising functions based on blind source separation. Specifically fast-ica with different contrast functions is used for this dissertation.

36 Chapter 4. Methodology Component Selection The application used for ICA sorts the returned components using their negentropy, providing a higher significance to the first components, which will make easier the selection of the components to be used. Even though the components are sorted, there are cases where a component less structure explanation and more noise has a better ranking than a component with less noise. An example is shown in figure 4.2 where component 8 is better ranked than component 9 while it is more noisy. Figure 4.2: The first 10 independent components of set 1 sorted with negentropy. Component 8 has a better ranking than component 9 while it is clear that this is not the case. Even though negentropy gives a better ranking to components with some structure compared to unstructured components, the ranking between structured components is not the optimal. This means that the approach of selecting the number of components using a range of different number of components as in PCA is not appropriate in this case. A better approach would probably be the use of a genetic algorithm for selecting the best structured components, but due to time limitations this was not used. Instead, all of the components that represent some structure of the stars were selected by hand ICA on each set ICA is tested only using each set as training set, because the results will be available much sooner than running it on each model and all of the sets. The main idea is that these results will be compared with the PCA results and if the RMSE is better then it can be run on each model and all of the sets at a future time. Also to do that, the

37 Chapter 4. Methodology 27 interpolation obtained by the methods using the data from all the sets or each model need to be better than the interpolation used on each set separately. Because ICA can get stuck in local minima, it needs to be run more than once on each set. In the experiments performed, ICA was run 10 times on each set. For each run the RMSE is calculated using the local evaluation framework and the independent components from the run with the lowest RMSE are used as the optimized independent components. The algorithm for performing ICA on a set is the following: Algorithm 4.7 ICA on each set 1. Vectorise the set 2.1 for i = 1 to N 2.2 Perform ICA using the DSS toolbox with the selected contrast function 2.3 Select the independent components to be used 2.4 Store the mixing matrix A i and the unmixing matrix W i of those components 2.5 end Once the mixing and unmixing matrices of each run are computed, they can be used in the local evaluation framework to get the RMSE of each run. For projecting the data to lower dimensions (2.6) is used and for reconstructing them from lower dimension (2.5). The selected mixing and unmixing matrices are the ones that produced the lowest RMSE Selecting the contrast function As mentioned in [13] ICA can be optimized using different contrast functions. The contrast functions used for optimizing ICA as proposed in [13] are the tanh (2.11), gauss (2.12) and kurtosis (2.13) functions shown in Chapter 2. To compare the contrast functions, algorithm 4.7 is used on different sets using each contrast function to obtain the best mixing and unmixing matrices using the local evaluation framework. This was done on 7 of the 26 sets, which were chosen so that at least 2 sets from each model found with PCA are in the training sets. Moreover they were selected so that there will exist at least one set from the sets with the smaller stars and one set from the sets with the larger stars. Once the RMSEs using each contrast function are obtained, their mean RMSEs on each set and the variation fo the RMSEs are compared to decide which a contrast function gives the best results.

38 Chapter 4. Methodology Kernel PCA The function kernelpca of the DR Toolbox [33] in MATLAB is used for the denoising of the stars. This implementation allows the use of a polynomial kernel (2.21) and a radial basis function kernel (RBF) (2.22). Both of these kernels are tested so that the optimal can be found Component Selection Kernel PCA first maps the data to a feature space F using a non linear function Φ and then PCA is performed on the mapped data. This means that the final results will be the eigenvectors and eigenvalues from PCA on the mapped data. Hence the eigenvectors will be sorted in descending order based on their eigenvalues. For these reasons, the process of selecting the components is the same with PCA with the exception that in step 1 and 5 of algorithm 4.3 kernel PCA will be used instead of PCA. Also the reconstruction at step 6 is done using (4.4) Kernel PCA on each set As with ICA, kernel PCA is tested only using each set as training set. In contrast to ICA kernel PCA on a set has a unique solution [32], which means that it will only be run once on each set with a certain kernel function. The algorithm for performing kernel PCA on a set is the following: Algorithm 4.8 Kernel PCA on each set 1. Vectorise the set 2. Perform kernel PCA using the DR toolbox with the selected kernel function 3. Select the principal components to be used 4. Project the data to lower dimensions using the eigenvectors E with the equation y = E T x This can be used as step 1 of the local evaluation framework so that the RMSE can be calculated. For the reconstruction of the data from their lower dimensions, there wasn t time to implement one of the algorithms proposed in [25] and [21], so a simple and naive solution is used. The inverse of the eigenvectors returned from kernel PCA is used, so the reconstruction of the data is done using x = (E T ) 1 y (4.4)

39 Chapter 4. Methodology Kernel Selection Kernel PCA is tested using the polynomial and radial basis function (RBF) kernel introduced in Chapter 2. The polynomial kernel is using a fourth order polynomial because it was found to remove most of the noise when the star images were visualized. To compare the effects of each kernel, algorithm 4.8 is used on each set using each kernel function after the optimal number of components is calculated using the component selection algorithm. Once the RMSEs using each kernel are obtained, their mean RMSEs on each set and the variation fo the RMSEs are compared to decide whether a kernel can be selected as the optimal.

40 Chapter 5 Results In this chapter, the results of the experiments are presented and discussed. For each approach used, a box-plot of the RMSEs on each set are presented and the mean RMSE of each set using that approach is provided. Also a visual example of the noise removal from star patches is shown for comparing the reconstructed stars obtained by each technique. Initially the results of the baseline approaches are shown and then the results of the dimensionality reduction techniques. Finally the techniques are compared to each other. 5.1 RMSE of the noise In this section the mean RMSE of the noise is shown. The RMSE was calculated using the values of the corner pixels of each star patch against a matrix with zero values. Because the reconstructed stars should have zero values at the non star pixels, the RMSEs obtained with this method will be indicative of the expected RMSEs to be obtained if the stars are denoised. Moreover, since the noise is similar in the images in each set the RMSE scores should have a small variance. 30

41 Chapter 5. Results 31 Set 1 Set 2 Set 3 Set 4 Set 5 Set 6 Set 7 mrmse Set 8 Set 9 Set 10 Set 11 Set 12 Set 13 Set 14 mrmse Set 15 Set 16 Set 17 Set 18 Set 19 Set 20 Set 21 mrmse Set 22 Set 23 Set 24 Set 25 Set 26 mrmse Table 5.1: The mean RMSE of the noise in each set. Any methods with good results should have values close to these in each set. 5.2 Initial baseline approach This section presents the results obtained using the baseline approach as proposed in Chapter 4. A box-plot of the RMSE on each set is illustrated in figure 5.1 and the mean RMSE of the baseline approach on each set is shown in table 5.2. In a box-plot, the upper edge of the box indicates the upper quartile of the RMSEs and the lower edge indicates the lower quartile. The line inside the box is the median RMSE score. The vertical lines extend at 1.5 times the inter-quartile range and any points outside the ends of the vertical lines are considered as outliers. In this case the outliers indicate the RMSEs that deviate from the main RMSE distributions.

42 Chapter 5. Results 32 Figure 5.1: Box-plot of the RMSEs on each set using the baseline approach. There are a lot of outliers in each set with a higher variance than the expected. The RMSEs are also higher than expected. Set 1 Set 2 Set 3 Set 4 Set 5 Set 6 Set 7 mrmse Set 8 Set 9 Set 10 Set 11 Set 12 Set 13 Set 14 mrmse Set 15 Set 16 Set 17 Set 18 Set 19 Set 20 Set 21 mrmse Set 22 Set 23 Set 24 Set 25 Set 26 mrmse Table 5.2: The mean RMSE of the reconstructed stars using the baseline approach on each set. The mean RMSEs show that the reconstructed stars are not correct since they are not close to the expected mean RMSEs. From the results it is clear that this approach is not appropriate for this task. Taking into consideration the values in table 5.1 and the mean RMSEs obtained with this method, it is clear that the reconstructed stars are way off the true PSF representation. Also the variance of the RMSE shown in figure 5.1 shows that this approach is not stable. An example of a reconstructed star using the baseline approach is illustrated in 5.2.

43 Chapter 5. Results 33 Figure 5.2: An example of a reconstructed star using the baseline approach. It is clear that the reconstruction is affected by the noise, resulting in a much larger PSF representation than the original The reconstructed star is much larger than the initial one and this is caused from the noise in the initial star image. As noted in chapter 2, quadrupole moments do not take into account the noise effect, but instead consider it as part of the initial star. Because of this, the reconstructed stars are much larger and with higher pixel values, resulting in bad star reconstructions. This explains the mean RMSE values in table 5.1 as well. Since the reconstructed stars are capturing almost all of the 30x30 star patches, the sets with smaller stars (set 6, 14 and 26) have higher mean RMSE since most of the reconstructed star pixels are compared with pixels that only contain noise, thus higher dissimilarity. On the other hand, sets 7 and 15 that have bigger stars, have a lower mean RMSE, since more pixels of the reconstructed stars are compared to pixels with higher intensities. 5.3 Improved Baseline Approach In this section the variation of the local evaluation framework is used to obtain the RMSE of the improved baseline approach. The difference is that the data are first preprocessed using PCA to remove as much noise as possible from the initial stars while retaining their shape, which is caused from the PSF. The RMSE on each set using the improved baseline approach is illustrated with a box-plot in figure 5.3 and the mean RMSE on each set is shown in table 5.3.

44 Chapter 5. Results 34 Figure 5.3: Box-plot of the RMSE on each set using the improved baseline approach. The RMSEs exceed the expected RMSE error, and the variances are higher than expected. Set 1 Set 2 Set 3 Set 4 Set 5 Set 6 Set 7 mrmse Set 8 Set 9 Set 10 Set 11 Set 12 Set 13 Set 14 mrmse Set 15 Set 16 Set 17 Set 18 Set 19 Set 20 Set 21 mrmse Set 22 Set 23 Set 24 Set 25 Set 26 mrmse Table 5.3: The mean RMSE of the reconstructed stars using the improved baseline approach on each set. The mean RMSEs are higher than expected, hence the method does not capture the true PSF of the stars. From these results, it is obvious that the improved baseline approach outperforms the initial baseline. The mean RMSE is much lower and there are cases like in set 1 where the mean RMSE is approximately 270 times smaller. Moreover as seen in figure 5.3 the variance of the RMSE is smaller in each set. But still, the reconstructed star is not accurately capturing the true star s PSF since the RMSE scores are higher than the expected. In figure 5.4 there is an example of the preprocessing with PCA and the

Chapter 5. Results 35 final reconstructed star with the quadrupole moments. Figure 5.4: An example of a reconstructed star using the baseline approach.

45 Chapter 5. Results 35 final reconstructed star with the quadrupole moments. Figure 5.4: An example of a reconstructed star using the baseline approach. In the middle image with the PCA preprocessed star, most of the noise is removed, but there is a halo around the star. In the reconstructed image, it can be seen that the quadrupole moments accounted the halo as part of the star, hence the reconstructed star is bigger than expected. As can be seen in figure 5.4, the quadrupole moments are sensitive to noise. Even though the largest part of the star image is noise free, the quadrupole moments account the halo created around the preprocessed star as part of the original star. This causes the reconstructed star to be greater in size than the original one, thus the results are not good enough for this task. This is the reason why LoG edge detection is used instead, so that any halo effects or noise outside the star s boundary can be removed. 5.4 PCA In this section the results concerning PCA are presented. First the number of components used on each PCA approach is provided, then the results of each PCA approach using the local evaluation framework are illustrated and compared. Initially the results of PCA on each set are shown, then the results of PCA on each model and finally the results of PCA on all of the sets Component Selection The range of components tested and the final number of components selected for each set for each PCA approach are presented in table 5.4.

46 Chapter 5. Results 36 PCA on each set PCA on each model PCA on all sets Set LB EUB FUB S Model LB EUB FUB S LB FUB S Table 5.4: The range of components tested on each PCA method and the selected number of components. LB is the lower boundary of the range of number of components tested. EUB is the upper boundary given by the visualisation of the eigenvectors. FUB is the final upper boundary given by the noise variance technique. S is the selected number of components. For PCA on all sets EUB is the same as FUB

47 Chapter 5. Results 37 The lower boundary provided by the Scree test is always the same in the PCA on each set approach and it seems to be affected by the number of training sets. For example on PCA on each set that only one set is used each time and pca on model 2 where six sets are used, the lower boundary is six. On the other hand, in the rest of the models where 10 sets are used and in PCA using all of the sets, the lower boundary increases to eight. A possible explanation of this, is that by using more training data, the analysis is able to identify a larger number of components that are more important, because more instances of a certain effect are visible. In PCA on each set, the final upper boundary selected using the noise variance and the residual variance is always smaller than the upper boundary proposed by the visualisation of the eigenvectors. This means that those components are not noise free and if used the reconstructed stars will be overfitting. On the other hand, PCA on each model and PCA on all sets will have sets that the reconstructed star will be overfitting because of the way the final upper boundary is selected in those cases and explained in Chapter 4. As expected, the final number of components selected for each set is the upper boundary of the range of number of components tested. This is normal since when more components are used, more information is used for the reconstruction of the stars, hence better results are obtained PCA on each set In this section the results of PCA using each set as a different training set are presented. The RMSE on each set is calculated using the local evaluation framework using all of the stars. The RMSE on each set and its variation is shown with a box-plot in figure 5.5 and the mean RMSE on each set is provided in table 5.5.

48 Chapter 5. Results 38 Figure 5.5: Box-plot of the RMSE on each set using PCA on each set. The RMSEs are close to the expected and the variances are small. The RMSEs have higher variances in sets with smaller stars where there is more noise and lower variance in sets with larger stars where there is less noise. In set 15 there are a lot of outliers which means that the method cannot capture the structure of all stars in that set. Set 1 Set 2 Set 3 Set 4 Set 5 Set 6 Set 7 mrmse Set 8 Set 9 Set 10 Set 11 Set 12 Set 13 Set 14 mrmse Set 15 Set 16 Set 17 Set 18 Set 19 Set 20 Set 21 mrmse Set 22 Set 23 Set 24 Set 25 Set 26 mrmse Table 5.5: The mean RMSE of the reconstructed stars using PCA on each set. The mean RMSEs are close to the noise mean RMSEs, which indicates that the method has good results. From figure 5.5 it can be seen that the RMSEs obtained at each star in a set are close. The RMSE scores in the figure are in the 10 3 power, which means that the variance between the RMSE scores is very small. The sets 6, 14 and 26 that have the

49 Chapter 5. Results 39 smallest stars have the largest RMSE variation. This means that the approach is less stable in those cases, but still due to the fact that the variation is small the results are promising. Unlike the other sets, in set 15 there are more outliers than expected and with a high deviation from the main RMSE distribution. This means that for most of the cases PCA captures the PSF of the stars, but for some stars it is less accurate. A possible explanation of this is that in this set there are some stars whose PSFs are not as common as with the others, and PCA was not able to capture them. This is a possible problem of PCA mentioned in [20]. From table 5.1 and table 5.5 it can be seen that the mean RMSEs are close to the expected, which indicates that the method manages to capture the PSF of most of the stars. Overall, this approach outperforms the improved baseline approach and has promising results. An example of a reconstructed star is shown in figure 5.6. Figure 5.6: An example of a reconstructed star using PCA on each set and the LoG edge detection. In the middle image with the PCA reconstructed star, most of the noise is removed, but there is a halo around the star. In the final image, it can be seen that the halo is removed As shown in figure 5.6, the noise removal with LoG edge detection is successful. The halo around the star is removed while the star s shape is intact. This is obtained by resetting any pixels outside the star s boundary detected with this method to zero. The disadvantage is that any noise that is inside the star s boundary and not removed from PCA remains in the final image PCA on each model In this section the results of PCA on each model are shown. This approach uses all of the sets that belong to a certain model as a single training set. First the models found are presented and then the RMSE on each set is calculated using the local evaluation

50 Chapter 5. Results 40 framework. After comparing the eigenvectors obtained from each set using PCA on each set, 3 distinct patterns were found. Each set was then combined with the rest of the sets that had the same eigenvector patterns. These new combined sets represent the data for each of the three models. In table 5.4 the model in which each set belongs to is provided. In figure 5.7 the three patterns used for categorizing the sets to the models they belong to are illustrated. Figure 5.7: The patterns of each model. The eigenvectors of the sets that belong to model 1 have a riple effect. In model 2 there are linear ripples whereas in model 3 the eigenvectors have a starburst effect. The eigenvectors of the sets that belong to model 1 have a ripple effect around their main structure, whereas in model 2 linear ripples are also introduced. In model 3 the main characteristic of the eigenvectors is the starburst effect and that there are no ripples. Each of these models might represent different atmospheric or telescopic effects, so this method can also be used for identifying different effects on star images. The RMSE scores on each set using PCA on each model are provided in figure 5.8 and table 5.6.

51 Chapter 5. Results 41 Figure 5.8: Box-plot of the RMSE on each set using PCA on each model. The RMSEs are close to the expected but there are more outliers, which indicates that some PSFs are not accurately captured. Set 1 Set 2 Set 3 Set 4 Set 5 Set 6 Set 7 mrmse Set 8 Set 9 Set 10 Set 11 Set 12 Set 13 Set 14 mrmse Set 15 Set 16 Set 17 Set 18 Set 19 Set 20 Set 21 mrmse Set 22 Set 23 Set 24 Set 25 Set 26 mrmse Table 5.6: The mean RMSE of the reconstructed stars using PCA on each model. The mean RMSEs are close to the expected, which indicates that the method has good results. The scores are slightly better than the ones obtained with PCA on each set. From table 5.5 and table 5.6 it can be seen that PCA on each model has slightly better results in most of the sets. The difference is approximately 1%, which is not substantial. Only set 15 gets a 5% improvement over the PCA on each set results. This difference is not big enough to suggest that this method should be preferred. Moreover, as mentioned in Chapter 4, this method will probably have sets that are overfitting due

52 Chapter 5. Results 42 to the way the components are selected. From figure 5.8 it is obvious that the RMSEs are not as stable as in PCA on each set. There are more sets now where outliers appear with a higher deviation from their main RMSE distribution than the expected one. The stars on which this happens are not many but it is mainly affecting the sets of model 2 and model 3. Since more data are used as a training set, it is possible that some stars with PSFs that are not that common were not taken into consideration by PCA, resulting in not being reconstructed with high accuracy. The reason that this did not happen in PCA on each set is that the data were smaller, so if a few of these uncommon PSFs existed, because of the smaller size of data they were considered important and taken into consideration. A recreation of a star is illustrated in figure 5.9, showing that the approach manages to remove the noise from the initial star. Figure 5.9: An example of a reconstructed star using PCA on each model and the LoG edge detection. The PCA reconstructed star has some noise and a halo around its boundary. In the final image, it can be seen that the halo and the noise is removed Compared to the reconstructed star using PCA on each set, the reconstructed star using this method looks like it has more noise around the star s boundary. This means that there is probably more noise inside the star boundary as well. If this is the case then the results using this method have lower RMSEs because the method is overfitting. Even though it is not clear if this method is better or not, it will be tested using the global evaluation framework for obtaining its true quality. It clearly has better results than the improved baseline approach and its RMSEs are close to the expected, which means that it is worth using.

53 Chapter 5. Results PCA on all of the sets The final method using PCA is the one that uses all of the data as training set. The results of this method using the local evaluation framework are provided in figure 5.10 and in table 5.7. Figure 5.10: Box-plot of the RMSE on each set using PCA on all of the sets.the RMSEs are close to the expected but there are more outliers, which indicates that some PSFs are not accurately captured. Set 1 Set 2 Set 3 Set 4 Set 5 Set 6 Set 7 mrmse Set 8 Set 9 Set 10 Set 11 Set 12 Set 13 Set 14 mrmse Set 15 Set 16 Set 17 Set 18 Set 19 Set 20 Set 21 mrmse Set 22 Set 23 Set 24 Set 25 Set 26 mrmse Table 5.7: The mean RMSE of the reconstructed stars using PCA on all of the sets. The mean RMSEs are close to the expected, which indicates that the method has good results. The scores are slightly better than the ones obtained with PCA on each set except set 7.

54 Chapter 5. Results 44 Comparing table 5.5 and table 5.6, the results of this method are slightly better than PCA on each set, except set 7. The mean RMSE is approximately 1% better in all of the sets except set 7 where it is 1% worse and set 15 that is 9% better. As was the case with PCA on each model, the RMSE scores are less stable than the ones obtained with PCA on each set for the same reasons. A recreation of a star is illustrated in figure Figure 5.11: An example of a reconstructed star using PCA on all of the sets and the LoG edge detection. The PCA reconstructed star has some noise and a halo around its boundary. In the final image, it can be seen that the halo and the noise is removed The final reconstructed star in figure 5.11 has the same shape as the other two methods using PCA. If the reconstructed star in the middle image of the figure is compared with that of PCA on each model and PCA on each set, then it is clear that there is more noise around the star. This means that there is probably more noise inside the star s boundary as well. As with the previous method, there is a chance that the RMSE scores are lower due to overfitting, hence it is not clear if it is better than the other PCA approaches. 5.5 ICA This section provides the results concerning ICA. First the number of components that is used on each set is provided. Then the results of different contrast functions are presented. Finally, the results of ICA with the selected contrast function are illustrated Component Selection In table 5.8 the components used on each set with each contrast function is shown. The components were selected by hand, by visualizing them and selecting the ones that

Machine Learning Methods for Regression in Astronomical Imaging

Machine Learning Methods for Regression in Astronomical Imaging Konstantinos Georgatzis E H U N I V E R S I T Y T O H F R G E D I N B U Master of Science Artificial Intelligence School of Informatics University