Improvement of Hubble Space Telescope subsystems through data mining

Improvement of Hubble Space Telescope subsystems through data mining S. Skias, M. Darrah & M. Webb Institute for Scientific Research Inc., USA Abstract The Hubble Space Telescope continues to increase mankind s knowledge and awareness of the universe. While in-orbit servicing has extended Hubble s lifetime, the expense of servicing missions is extremely high. If mission lifetime is to be extended beyond the servicing mission era, innovative mission extension concepts must be employed. One such concept to accomplishing this is data mining of historical Hubble engineering data to increase the success rate of Hubble s science data collection. A critical component of successfid science data collection is the Fixed Head Star Tracker (FHST) subsystem. This subsystem performs the fwst step, known as an update, in the telescope pointing process and has a failure rate of approximately 0.015. The NASA s Hubble Project initiated a data mining effort on this subsystem, which was undertaken by the Institute for Scientific Research Inc. (ISR). Previous failure analyses indicated that many of the past failures could be attributed to the distribution of stars within the tracker s field of view. A dataset was constructed of FHST data fi-om the following sources: Teleme@, Command, and Star data. This data was processed using a recursive-partitioning algorithm implemented in S-Plus. The resulting decision tree indicates that preevent FHST data supports failure prediction. Using this tree to select guide stars could reduce the tracker failure rate to 0.002. The decision tree has been validated using mission planning software. Data on the actual performance using this technique is being collected. This effort demonstrates how data mining can uncover operational deficiencies and provide cost-efficient solutions for improving subsystem performance.

850 Data Mining III 1 Introduction The concept of a telescope orbiting the Earth was introduced in 1923, long before space flight was achieved. Almost fifty years later, the Hubble Space Telescope (HST) was built. The HST was designed in the 1970s and launched by NASA in 1990 in an effort to increase mankind s knowledge of the universe. Hubble orbits 600 kilometers (375 miles) above the Earth and uses pointing precision, powerful optics, and state-of-the-art cameras and spectrographs to provide stunning views of the universe. Hubble was the fust scientific mission designed to be routinely serviced. Thanks to on-orbit calls by Space Shuttle astronauts, Hubble continues to produce data that helps to resolve fundamental questions in astronomy and cosmology. During its 11 years in orbit, Hubble has been serviced four times in order to maintain or improve its operation. With the high cost of service missions and with only one more servicing mission scheduled, the Hubble Project initiated an effort to explore alternative, lower-cost methods to maintain the telescope s optimum performance. Data mining of Hubble historical data is one of these methods. ISR was tasked with exploring opportunities to improve Hubble performance through data mining of historical Hubble engineering data. The goal of the data mining initiative was to demonstrate that data mining can uncover hidden performance inadequacies to provide insight into resolving those issues. This paper describes how we approached this task, what data was selected and how it was analyzed, and what discoveries were made tlom the initial mining efforts of the Hubble Telescope data. Two factors that contributed to our successfully achieving the goal of this project were consulting from Patrick Aboyoun of Insightful Corporation and the capability of S-PLUS. The material is based upon work supported by NASA (Goddard Space Flight Center) under Cooperative Agreement Award No. NCC5-639. 2 Background Hubble is expensive to operate and has a limited lifetime; consequently, the failure of a science data collection mission is extremely costly. A critical component of science data collection is accurately pointing the telescope. The precision required for Hubble s sophisticated pointing control system is similar to pointing a laser on a one centimeter circle 300 kilometers away, and holding it steady for hours or days. Consequently, pointing control is a critical system; therefore, it was the focus for our initial data mining efforts. Pointing Hubble is a multi-stage process. During mission execution, the telescope is commanded to maneuver to point in the direction of the object to be studied. This is called a slew. Next, the accuracy of the orientation is verified through a procedure called an update. One or two updates may be planned for each slew. The update relies on the Fixed Head Star Tracker (FHST) subsystem, which consists of three star

Data Mining III 851 trackers fixed to the body of the telescope. Either one or two trackers can be used for this initial pointing correction. During the planning for a slew, scientists select a specific guide star within the field of view of each tracker to be used for the update. During the update, each chosen tracker scans within a reduced field of view (RFOV), locks on, and reports the position of any star matching the guide star s magnitude. The Hubble s main control program then computes the error between the star s reported and expected positions. If the error exceeds a given threshold, the control program commands the tracker to continue scanning for a new candidate star. If the assigned trackers lock on ( acquire ) the guide star, then the FHST update is successful and the Fine Guidance System begins the final stage of guidance control, attaining the pointing accuracy necessary for science data collection. If any assigned tracker fails to acquire the guide star after three attempts, then the update fails, and, because the HST is not properly aligned, the planned science data collection may not be performed. Previous analysis of FHST failures indicated that factors such as the location of the guide star within the field of view and the brightness of nearby stars seemed to account for most failures, but no effort had been made to quanti~ or verifi these factors. 3 Approach Our research efforts to mine FHST data involved five steps: identi~ing relevant data sources, defining the analysis database, populating the data tables, reducing the data, and mining the data. The data mining of FHST updates used data from three different sources. These sources provide information about what was planned (Command data), what actually happened (Telemetry data), and what the FHST was looking at (Star data). Together, these three sources provided sufficient data to support the data mining effort. Before mining could commence, the data from the three sources had to be formatted and merged into a structured database. The data was stored in an Oracle 8i database consisting of five tables: Command Slew Data, Command FHST Updates, Command FHST Data, FHST Telemetry Data, and Star Data for the FHST field of view. Hubble historical data is stored in two places, a data archive and a data warehouse. The majority of the required data resides in the data warehouse, which consists of a series of AMAS tapes. To reduce the time spent downloading and formatting data, only the most recent year, 2001, was selected for the data mining effort. Only telemetry and command data are available from the warehouse. Star data was obtained using a prototype FHST planning tool to extract data from the Hubble star catalog. The 7,000 updates and the 0.015 update failure rate for 2001 are consistent with the observed values during the past five years. Since failure prediction and classification were the primary goals of this effort, priority was placed on

852 Data Mining III obtaining data for the failed updates. Of the 104 failed updates in 2001, we were able to download data for 91. Thirteen of the failed updates were not usable due to insufficient data in the warehouse. Data identification and download is a timeconsuming process. Within the allotted time, we were able to obtain the data for 33 randomly-selected successful updates. The total data available for analysis consisted of 75 slew rows, 122 update rows, 240 command update data rows, 209,111 teleme~ rows, and 4,670 star rows. The data tables were imported to S-PLUS, a commercially-available statistical analysis tool [1]. Each update is assigned to either one or two trackers. An update fails if any assigned tracker fails to acquire its guide star. Acquisition failures by both trackers during an update are rare. In the data, there were 237 tracker acquisition attempts. Analysis of the tracker telemetry data yielded 77 acquisition failures, 136 successes, and 24 that could not be classified. The 77 acquisition failures correspond to 77 update failures. A byproduct of the tracker acquisition failure analysis was a reduction of the telemetry data to 36,615 rows. Telemetry data at this point was a good candidate for conducting time-series analysis; however, previous research by the Hubble engineers indicated that telemetry alone would not be sufficient to predict acquisition failure. Rather than pursue time-series analysis, we reduced each numeric variable in the telemetry data to a set of statistics (rein, max, median, mean, fwst quartile, and third quartile) for the 237 cases. This provided one row of telemetry for each acquisition and permitted the merging of telemetry, command, and slew data into one table. In mining the data, we utilized the partitioning and tree-based modeling features of S-PLUS. S-PLUS provides a 10-fold cross-validation partitioning method called Recursive Partitioning (RPART). First, RPART was applied to the merged dataset. No conclusive results were produced, since telemetry data observes failures as they are occurring. While this did not support failure prediction, it did, however, support failure resolution and provided the foundation for an automated approach to determining the cause of the failure in near real-time. Next, star and command data were examined. This examination directly reflected our hypothesis that FHST update failures were caused by selection of guide stars that were likely to fail in the tracker acquisition process. The dataset used in this analysis consisted of the following: acquisition success or failure indicator. guide star position within the FHST field of view reduced field of view positions a count of stars in the total field of view. distances from the guide star to the nearest star the nearest star s magnitude the distance fi-omthe guide star to the galactic equator distances of the guide star to the reduced field of view edges RPART was then applied to the star dataset and a predictive model was produced.

Data Mining III 853 4 RPART results for star dataset Figure 1presents the classification tree produced by the RPART algorithm when applied to the star dataset. The tree classifies the data into ten separate cases identified by the leaves of the tree. In each case, the numbers in the box are presented in the following format: Number of acquisition failures/number of acquisition successes. DISTFW2MW RTRUE nnae.-ewco DISUIFOWD m LEFTti..7ma EHEI & IFIUE,(!4 H -t 2.4 H 12X FALSE TRG w 1!!2 met-mm b <425 TRUE Figure 1: Star data resursive partitioning analysis. Using the RPART classification tree on the 213 guide stars in the data to m-edict. success or failure for an acquisition yields the results shown in Table 1. Table 1. Observed versus predicted results for the RPART classification tree. Predicted Observed Failure Success Failure 65 12 Success 12 124 Total 77 136

854 Data Mining III Applying the predictions of the RPART classification would have resulted in 12 failures or an 84% reduction in failures. A comparable reduction in the observed update failure rate of 0.015 would yield an update failure rate of 0.0023. The next section describes the application of the classification tree to support guide star selection. 5 Application As part of the data mining initiative, we sought applications that incorporate the data mining results into the current HST Ground Control System. Two applications that were identified to be most useful are an anomaly predictor and an anomaly resolution tool. The anomaly predictor uses the star data analysis results to rate candidate guide stars within the RFOV. The rating is derived from the classification tree in Figure 1 as follows: 1. The star is classified using the tree. 2. The star is assigned a rating equal to the number of successes in its class divided by the total number in the class. 3. The rating is presented as the probability of successful star acquisition. The star in the center of the small circle in the lower right of Figure 2 indicates a selected guide star for the current update. The RFOV is placed at an offset position of 2.25, -1.25 degrees from the center of the total field of view. When the predictor is applied to the star, a window is displayed with the probability of acquisition for the star. In this case, the selected guide star has a 7.69% chance of being acquired. Since a 7.69% chance of probability is not acceptable, the RFOV is moved to a different position as shown in Figure 3. This is done because the results tiom our analysis indicate that failures are due to star positioning within the RFOV. Now the RFOV is at position 2.25, -0.75 degrees offset fkom the center. The predictor is applied to the star again and a different probability of acquisition is calculated. With a simple shift in RFOV position of half a degree, the probability of acquisition increased from 7.69% to 92.310A. This does not guarantee acquisition of the star, but it does increase the chances of acquiring it and decreases the potential for failure. An automated system is being designed that examines all stars within the total field of view and returns the stars and reduced field of view positions that will produce the highest probability of acquisition rating. The second application resulted from the reduction of telemetry data. A prototype for an anomaly resolution tool was developed during analysis of the telemetry data. Scripts were generated in S-PLUS that developed graphical images of the analysis. A fully-developed anomaly resolution tool could operate in near real-time by monitoring telemetry. In the event that a failure occurs, the tool could then start a procedure of data collection directly from the telemehy stream and start evaluating the data using algorithms derived fi-om the data

Data Mining III 855 mining results. The algorithms could classi~ the type of failure and automatically log the update in the current reporting system. / Figure 2: Screen shot with original guide star rating.

856 Data Mining III 6 Conclusions Figure 3: Screen shot with new new guide star rating. This data mining effort demonstrated that the performance of the FHST subsystem can be improved without changing any physical parts of the telescope. Applying these same techniques to other subsystems for which data is available can help overcome system degradation and extend the operational life of Hubble Space Telescope. Reference [1] Venables, W.N., & Ripley, B.D., Modern Applied Statistics with S-PLUS, Springer-Verlag: New York Berlin Heidelber, 1999.