Two-Stream Bidirectional Long Short-Term Memory for Mitosis Event Detection and Stage Localization in Phase-Contrast Microscopy Images

Size: px

Start display at page:

Download "Two-Stream Bidirectional Long Short-Term Memory for Mitosis Event Detection and Stage Localization in Phase-Contrast Microscopy Images"

Egbert Wood
6 years ago
Views:

1 Two-Stream Bidirectional Long Short-Term Memory for Mitosis Event Detection and Stage Localization in Phase-Contrast Microscopy Images Yunxiang Mao and Zhaozheng Yin (B) Computer Science, Missouri University of Science and Technology, Rolla, USA Abstract. In this paper, we propose a Two-Stream Bidirectional Long Short-Term Memory (TS-BLSTM) for the task of mitosis event detection and stage localization in time-lapse phase contrast microscopy image sequences. Our method consists of two steps. First, we extract candidate mitosis image sequences. Then, we solve the problem of mitosis event detection and stage localization jointly by the proposed TS-BLSTM, which utilizes both appearance and motion information from candidate sequences. The proposed method outperforms state-of-the-arts by achieving 98.4% precision and 97.0% recall for mitosis detection and 0.62 frame error on average for mitosis stage localization in five challenging image sequences. 1 Introduction Analyzing the proliferative behavior of stem cells in vitro without staining or altering them plays an important role in many biomedical applications, such as drug discovery, stem cell manufacturing, and tissue engineering. Phase-contrast microscopy, as a non-invasive imaging modality, allows to persistently monitor cells behavior without altering them [1]. The key to monitor the health and growth rate of cells is accurate enumeration and localization of the occurrences of mitosis, which is the process whereby the genetic material of an eukaryotic cell is equally divided, resulting in daughter cells. In fact, the process of a mitosis event consists of four stages as shown in Fig. 1: (1) interphase; (2) start of mitosis; (3) formation of daughter cells and (4) separation of daughter cells. The stages are defined based on the visual transition of cell appearances. In the four stages, a mitotic cell has the following four sequential actions: (1) cell appearance remains normal, (2) cell shrinks its size, rounds up and increases its brightness, (3) two daughters become visible and appear like a number 8 and (4) two daughter cells physically separate Electronic supplementary material The online version of this chapter (doi: / ) contains supplementary material, which is available to authorized users. c Springer International Publishing AG 2017 M. Descoteaux et al. (Eds.): MICCAI 2017, Part II, LNCS 10434, pp , DOI: /

2 Two-Stream Bidirectional Long Short-Term Memory 57 and move away. Accurately localizing the time of each stage will facilitate the quantification of biological metrics, allowing biologists to assess different factors that impact the length of time a cell spends in each stage of mitosis. Fig. 1. The process of a mitosis event. In this paper, the mitosis event detection is defined as the classification of a patch sequence containing a mitosis process or not and the stage localization refers to the localization of the time at which Stage 2 4 begin in one sequence. 1.1 Related Work Several tracking-based mitosis detection methods have been proposed in the past decade on phase-contrast microscopy images [2, 10]. The problem of mitosis detection in these papers is solved based on volumetric image segmentation or object tracking algorithms with the goal of tracking cell movements over time. However, these mitosis event detection approaches heavily depend on the longterm object tracking performance, which itself is very challenging. To conquer the drawback of tracking-based mitosis detection, tracking-free approaches detect mitosis directly in an image sequence. Liu et al. [4] trained a Hidden Conditional Random Fields (HCRF) [3] model to classify candidate patch sequences. The drawback of their method is that only one label is assigned to a patch sequence, so accurate localization of each stage cannot be achieved. Huh et al. [5] proposed an Event-Detection CRF (EDCRF) in which each patch in a candidate sequence is assigned with one label. Liu et al. [6] proposeda maximum-margin HCRF and a Max-Margin Semi-Markov Model, in which the mitosis detection and stage localization are separately done in two steps, which means if the mitosis event detection fails, the localization step will not be applied to sequences that are classified as sequences not containing mitosis. Different from the above methods depending on handcrafted features, Convolutional Neural Networks (CNN) [13] offer the ability to learn the most representative features instead of human-engineered features. The latest CNN-based methods [9, 12] achieve good performance on the task of mitosis event detection, in which their Convolutional Neural Networks only accept a fixed-size vector as input and produce a fixed-size vector as output, e.g. probabilities of different classes. However, the length of extracted sequences varies. Furthermore, since their models take fixed-size input and output one label for the patch sequence, they are not able to perform the task of localization of different mitosis stages. Long-term Short Memory (LSTM) [14], which is able to address variantlength input, is widely used in natural language processing. It can be adapted

3 58 Y. Mao and Z. Yin to many-to-many, many-to-one, and one-to-many models according to different tasks [15]. Hence, LSTM is suitable for our tasks but creative designs are needed to make the technique to be feasible for both mitosis event detection and stage localization. 1.2 Motivation and Contributions To solve the problem of mitosis event detection and mitosis stage localization, our proposed method consists of two steps: first we extract candidate sequences from the microscopy image sequences; then, we label each candidate sequence and its images at each timestamp by a Two-Stream Bidirectional Long Short- Term Memory (TS-BLSTM) model, in which (1) we unify both appearance and motion information in continuous images for rich feature description, (2) Bidirectional LSTM is applied instead of the LSTM to unify forward and backward information for better localizing each stage, (3) one objective function is formulated to unify the two problems of mitosis event detection (sequence classification) and stage localization (individual image labeling). To our knowledge, our work is the first to demonstrate the applicability of LSTM to solve the problem of mitosis event detection and stage localization in microscopy image sequences. 2 Methodology Our proposed method takes a video sequence as the input, extract candidate sequences that possibly contain mitosis events, classify each sequence as mitosis sequence or not and localize the time of four stages simultaneously. 2.1 Candidate Sequence Extraction The candidate sequence extraction aims to extract all patch sequences that contain mitosis, while extracting sequences not containing mitosis as less as possible. As a result, the subsequent TS-BLSTM step can be more efficiently conducted on the candidate pool rather than the entire video volume. First, we run illumination normalization on the observed image. Secondly, we compute the average image of the video sequence, then subtract the average from every image. Bright artifacts caused by non-cell stationary dirts in the culturing dish can be removed by this simple procedure. Thirdly, we apply a Gaussian filter to smooth the artifact-free image and threshold it into a binary mask. Finally we track each connected component (blob) into candidate sequences by considering the tracking as an association problem [16]. The size of each image patch in the sequences are extracted as In our experiments, the Gaussian filter has standard derivation of 3, the threshold in thresholding images is set to be 10. The parameters are set safely using a typical cross validation scheme to ensure that the recall of mitosis events is 100% before the classification step. Samples of extracted sequences are shown in Fig. 2.

2 Two-Stream Bidirectional Long-Short Term Memory The overall architecture of our proposed TS-BLSTM is illustrated in Fig. 3.

4 Two-Stream Bidirectional Long Short-Term Memory 59 Fig. 2. Samples of extracted candidate sequence. 2.2 Two-Stream Bidirectional Long-Short Term Memory The overall architecture of our proposed TS-BLSTM is illustrated in Fig. 3. Suppose we have N appearance images X i,i [1,N] and their corresponding motion images M i in one sequence. The motion images are computed simply by the frame difference. We design a CNN, as shown in Fig. 4, to extract the feature representation from the last fully-connected layer. Hence, appearance image X i Fig. 3. The overview of our proposed TS-BLSTM.

60 Y. Mao and Z. Yin Fig. 4. The architecture of CNN we used to extract features from the input images. The number r c p under each layer is the number of rows, columns and channels.

5 60 Y. Mao and Z. Yin Fig. 4. The architecture of CNN we used to extract features from the input images. The number r c p under each layer is the number of rows, columns and channels. and motion image M i will have feature vector fi x and fi m, respectively. Then the features of appearance images and motion images are fed into BLSTMs and generate the label li x and li m, respectively. For each image, its label li x predicted by appearance BLSTM and label li m predicted by motion BLSTM are concatenated to make the final prediction L i for each image in the sequence, i.e. solving the mitosis stage localization problem. To solve the mitosis detection problem, we add one more BLSTM on top of the prediction result of each image to generate the sequence label L S. The joint objective function of our TS-BLSTM is formulated as below: min { T S log(l S ) (1 T S ) log(1 L S ) T j L j i log Lj i } (1) i,ls i [1,N] j [1,C] T S is the label for the sequence. T j i and L j i are the label and prediction of the jth image in the ith sequence. C is the number of classes (i.e. C = 4 stages). The two tasks we try to solve here are: (1) mitosis event detection, which is a many-to-one, binary classification problem. This requires the model to take a sequence of images as input and output one label for the whole sequence. And (2) mitosis stage localization, in which each image of one sequence is labeled to indicate which stage it belongs to, can be considered as a many-to-many problem. This demands our model to be able to produce multiple types of outputs based on its multiple inputs. We unify the mitosis detection and stage localization in one architecture by combining the many-to-one model and many-to-many model in LSTMs. Furthermore, the key to precisely label each stage in the input sequence is to locate the transition frame between two consecutive stages. When we annotated the ground truth of different stages, human experts need to look back and forth to determine which frame is exactly the transition frame between two stages. This motivates us that stage labeling should consider two directions. In our architecture, the proposed bidirectional LSTM offers the ability to unify information in both directions to label one image in the sequence. We utilize not only the appearance information, but also the motion information over time since the movement pattern of mitotic cells during different stages are different from that of migration cells. Unifying both appearance

6 Two-Stream Bidirectional Long Short-Term Memory 61 andmotioncuesprovides rich features to describe the data thus boosts the classification performance. When training the CNN, only the starting frame of Stage 3 is considered positive, and others are labeled as negative. Two individual CNNs are trained for the appearance input and motion input, respectively. For the training of CNN with MatConvNet, we set the patch size as 100 and the number of epoch as 20 with the learning rate gradually decreasing from 10 2 to The drop-out rate is set to be 0.5. When training the TS-BLSTM with Keras, we pad each training sequence to be the length of 50. The number of epoch is set as 10, and learning rate is 10 3 with decay rate as Experiments 3.1 Dataset We evaluate our proposed method in five phase-contrast video sequences obtained from [6], with each containing 79, 94, 85, 120 and 41 mitosis cells, respectively. Each sequence consists of 1436 images (resolution: pixels). The location and time of different stages in mitosis sequences in the video are provided as the ground truth. In order to train our CNN and TS-BLSTM, data expansion is performed to generate more positive training data to avoid the problem of overfitting. For each positive mitosis sequence, we rotate the images every 45 (8 variations), slightly translate the images horizontally and/or vertically (9 variations), which generates 72 times of the original positive training data. Negative sequences are extracted by the proposed candidate sequence extraction method. 3.2 Evaluation Metric We adopt leave-one-out policy in the experiment, i.e., using four sequences for training and the rest one for testing. Since other competing methods classify the sequence only based on the detection of starting time of stage 3, a sequence is defined as a mitosis sequence only if it contains the starting frame of stage 3. In this case, we define True Positive as a mitosis sequence that is classified as positive, False Positive as a non-mitosis sequence that is mistakenly labeled as positive, and False Negative as a mitosis sequence that is mistakenly classified as negative. Two evaluations are used in our experiments. First, we evaluate the performance of mitosis detection in terms of the mean and standard deviation of precision, recall and F score on the five leave-one-out tests. Second, we evaluate the performance of stage localization strictly in terms of the localization error of the starting frame of each stage. The localization error is defined as the frame difference between the detection result and the ground truth.

7 62 Y. Mao and Z. Yin 3.3 Validation on the Proposed Architecture In this section, we show the effectiveness of each module in the proposed architecture. We compare the performance of (1) the proposed TS-BLSTM, (2) D-TS-BLSTM (detection-only TS-BLSTM, in which only the label of sequence is predicted and the objective function does not take the classification error of each image into consideration), (3) A-BLSTM (TS-BLSTM without incorporating the motion BLSTM), (4) M-BLSTM (TS-BLSTM without incorporating the appearance BLSTM) and (5) TS-LSTM (replacing the BLSTMs in TS-BLSTM with LSTMs). As shown in Table 1, the proposed TS-BLSTM outperform other models, which shows each module (unifying mitosis event detection and stage classification, motion feature, appearance feature, and bidirectional LTSM) in the architecture of TS-BLSTM is necessary and helps boosting the performance. Table 1. Mitosis event detection accuracy of different designs. Model Precision (%) Recall (%) Fscore(%) TS-BLSTM 98.4 ± ± ± 1.2 D-TS-BLSTM 94.5 ± ± ± 2.1 A-BLSTM 90.4 ± ± ± 3.2 M-BLSTM 94.4 ± ± ± 2.6 TS-LSTM 90.2 ± ± ± Comparisons on the Mitosis Event Detection We compare our method with seven state-of-the-arts on the performance of mitosis event detection: HCNN [9], Max-Margin Hidden Conditional Random Fields + Max-Margin Semi-Markov Model (MM-HCRF + MM-SMM) [6], EDCRF [5], Max-Margin Hidden Conditional Random Fields [6], HCRF [4], Hidden Markov Model (HMM) [7], and Support Vector Machine (SVM) [8]. As shown in Table 2, our TS-BLSTM achieves an average precision of 98.4%, recall of 97.0 and F score of 97.7%, which outperforms existing models. HCNN [9] classifies the candidate sequence by only considering several frames nearby the starting frame of stage 3. While our model takes the whole sequence into consideration, the performance does not heavily rely on the detection of stage 3. MM-HCRF + MM-SMM [6] finishes the tasks of mitosis detection and stage localization in two separate steps, the solution cannot be jointly optimal. 3.5 Comparisons on the Mitosis Stage Localization To label one mitosis sequence into the four stages, we only need to localize the starting frame of stage 2, 3, and 4. In previous work, only MM-HCRF + MM- SMM [6] is able to localize different stages while others only focus on the localization of the starting frame of stage 3. We summarize the comparison of each

8 Two-Stream Bidirectional Long Short-Term Memory 63 Table 2. Comparison of mitosis event detection. Model Precision (%) Recall (%) Fscore(%) Our TS-BLSTM 98.4 ± ± ± 1.2 HCNN 96.6 ± ± ± 0.8 MM-HCRF + MM-SMM 95.8 ± ± ± 2.0 EDCRF 91.3 ± ± ± 0.7 MM-HCRF 82.8 ± ± ± 1.6 HCRF 90.5 ± ± ± 4.4 HMM 83.4 ± ± ± 3.4 SVM 68.0 ± ± ± 1.7 mitosis stage localization accuracy in Table 3. The results in Table 3 demonstrate that our method not only performs different stage localization with better performance than [6], but also achieves better accuracy for locating the starting frame of Stage 3, which is a critical point of analyzing mitosis events, than other methods. Table 3. Comparison of stage localization accuracy. Model Stage 2 Stage 3 Stage 4 Our TS-BLSTM 0.78 ± ± ± 0.06 MM-HCRF + MM-SMM 0.82 ± ± ± 1.72 HCNN N/A 0.69 ± 0.91 N/A EDCRF N/A 0.83 ± 1.34 N/A 4 Conclusion In this paper, we propose a Two-Stream Bidirectional Long Short-Term Memory (TS-BLSTM) to tackle the two problems of mitosis event detection and stage localization jointly in phase-contrast microscopy images. Both appearance and motion information are utilized to provide rich feature description. Bidirectional LSTM helps to utilize information in both directions. In the experiments, we validate the proposed architecture and that our model outperforms other stateof-the-arts in both two tasks. Acknowledgement. This project was supported by NSF CAREER award IIS and NSF EPSCoR grant IIA

9 64 Y. Mao and Z. Yin References 1. Li, K., et al.: Computer vision tracking of stemness. In: Proceedings of IEEE International Symposium on Biomedical Imaging, pp (2008) 2. Li, K., et al.: Cell population tracking and lineage construction with spatiotemporal context. Med. Image Anal. 12(5), (2008) 3. Quattoni, A., et al.: Hidden conditional random fields. IEEE Trans. Pattern Anal. Mach. Intell. 29(10), (2007) 4. Liu, A., et al.: Mitosis sequence detection using hidden conditional random fields. In: Proceedings of IEEE International Symposium on Biomedical Imaging (2010) 5. Huh, S., et al.: Automated mitosis detection of stem cell populations in phasecontrast microscopy images. IEEE Trans. Med. Imag. 30(3), (2011) 6. Liu, A., et al.: A semi-markov model for mitosis segmentation in time-lapse phase contrast microscopy image sequences of stem cell populations. Proc. IEEE Trans. Med. Imag. 31(2), (2012) 7. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), (1989) 8. Suykens, J., et al.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), (1999) 9. Mao, Y., Yin, Z.: A hierarchical convolutional neural network for mitosis detection in phase-contrast microscopy images. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI LNCS, vol. 9901, pp Springer, Cham (2016). doi: / Liang, L., et al.: Mitosis cell identification with conditional random fields. In: Proceedings of Life Science Systems and Applications Workshop, p. 912, November Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comp. Vis. 60(2), (2004) 12. Nie, W., et al.: 3D convolutional networks-based mitotic event detection in timelapse phase contrast microscopy image sequences of stem cell populations. In: Proceedings of CVPRW, June Krizhevsky, A., et al.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp (2012) 14. Hochreiter, S., et al.: Long short-term memory. Neural Comput. 9(8), (1997) 15. Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of CVPR (2015) 16. Mao, Y., et al.: Who missed the class? Unifying multi-face detection, tracking and recognition in videos. In: Proceedings of ICME (2014)

A Hierarchical Convolutional Neural Network for Mitosis Detection in Phase-Contrast Microscopy Images

A Hierarchical Convolutional Neural Network for Mitosis Detection in Phase-Contrast Microscopy Images Yunxiang Mao and Zhaozheng Yin (B) Department of Computer Science, Missouri University of Science and