MW-PCA-LDA Applied to Vis-NIR Discriminate Analysis of Transgenic Sugarcane

Similar documents
Moving-window bis-correlation coe±cients method for visible and near-infrared spectral discriminant analysis with applications

Application of Raman Spectroscopy for Detection of Aflatoxins and Fumonisins in Ground Maize Samples

Visible-near infrared spectroscopy to assess soil contaminated with cobalt

Design and experiment of NIR wheat quality quick. detection system

Study on the Concentration Inversion of NO & NO2 Gas from the Vehicle Exhaust Based on Weighted PLS

Journal of Chemical and Pharmaceutical Research, 2015, 7(4): Research Article

Putting Near-Infrared Spectroscopy (NIR) in the spotlight. 13. May 2006

Detection and Quantification of Adulteration in Honey through Near Infrared Spectroscopy

ADAPTIVE LINEAR NEURON IN VISIBLE AND NEAR INFRARED SPECTROSCOPIC ANALYSIS: PREDICTIVE MODEL AND VARIABLE SELECTION

Detection of Chlorophyll Content of Rice Leaves by Chlorophyll Fluorescence Spectrum Based on PCA-ANN Zhou Lina1,a Cheng Shuchao1,b Yu Haiye2,c 1

FTIR characterization of Mexican honey and its adulteration with sugar syrups by using chemometric methods

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

NIR Spectroscopy Identification of Persimmon Varieties Based on PCA-SVM

Application of color detection by near infrared spectroscopy in color detection of printing

Biological and Agricultural Engineering Department UC Davis One Shields Ave. Davis, CA (530)

IR Biotyper. Innovation with Integrity. Microbial typing for real-time epidemiology FT-IR

MULTIVARIATE STATISTICAL ANALYSIS OF SPECTROSCOPIC DATA. Haisheng Lin, Ognjen Marjanovic, Barry Lennox

Analysis of Cocoa Butter Using the SpectraStar 2400 NIR Spectrometer

Application of Raman Spectroscopy for Noninvasive Detection of Target Compounds. Kyung-Min Lee

Identification of terahertz fingerprint spectra extracted from Gas-Fat coal

PREDICTION OF SOIL ORGANIC CARBON CONTENT BY SPECTROSCOPY AT EUROPEAN SCALE USING A LOCAL PARTIAL LEAST SQUARE REGRESSION APPROACH

Positive and Nondestructive Identification of Acrylic-Based Coatings

Portable Raman Spectroscopy for the Study of Polymorphs and Monitoring Polymorphic Transitions

Application of near-infrared (NIR) spectroscopy to sensor based sorting of an epithermal Au-Ag ore

Mixture Analysis Made Easier: Trace Impurity Identification in Photoresist Developer Solutions Using ATR-IR Spectroscopy and SIMPLISMA

Detection of trace contamination on metal surfaces using the handheld Agilent 4100 ExoScan FTIR

Development of Near Infrared Spectroscopy for Rapid Quality Assessment of Red Ginseng

Method Transfer across Multiple MicroNIR Spectrometers for Raw Material Identification

Generalized Least Squares for Calibration Transfer. Barry M. Wise, Harald Martens and Martin Høy Eigenvector Research, Inc.

Supporting Information

Influence of different pre-processing methods in predicting sugarcane quality from near-infrared (NIR) spectral data

Validation of a leaf reflectance and transmittance model for three agricultural crop species

Recent Progress of Terahertz Spectroscopy on Medicine and Biology in China

Spectroscopy in Transmission

protein interaction analysis bulletin 6300

Highly doped and exposed Cu(I)-N active sites within graphene towards. efficient oxygen reduction for zinc-air battery

Howard Mark and Jerome Workman Jr.

NOTE: The color of the actual product may differ from the color pictured in this catalog due to printing limitation.

Reprinted from February Hydrocarbon

HYPERSPECTRAL IMAGING

Techniques useful in biodegradation tracking and biodegradable polymers characterization

Laser on-line Thickness Measurement Technology Based on Judgment and Wavelet De-noising

The Raman Spectroscopy of Graphene and the Determination of Layer Thickness

Myoelectrical signal classification based on S transform and two-directional 2DPCA

SPECTROPHOTOMETRY AND SPECTROMETRY - CONCEPT AND APPLICATIONS

Chemometrics. 1. Find an important subset of the original variables.

Principal component analysis

What do plants compete for? What do animals compete for? What is a gamete and what do they carry? What is a gene?

Plum-tasting using near infra-red (NIR) technology**

Corresponding Author: Jason Wight,

Highly Stretchable and Transparent Thermistor Based on Self-Healing Double. Network Hydrogel

Study of blood fat concentration based on serum ultraviolet absorption spectra and neural network

New Developments in Raman Spectroscopic Analysis

Multivariate calibration

Food authenticity and adulteration testing using trace elemental and inorganic mass spectrometric techniques

Molecular Spectroscopy In The Study of Wood & Biomass

Demonstrating the Value of Data Fusion

arxiv: v1 [astro-ph.im] 25 Apr 2014

Exploring Raman spectral data using chemometrics

Research Article Automatic and Rapid Discrimination of Cotton Genotypes by Near Infrared Spectroscopy and Chemometrics

Design and Development of a Smartphone Based Visible Spectrophotometer for Analytical Applications

For more information, please contact: or +1 (302)

PRECISION AGRICULTURE APPLICATIONS OF AN ON-THE-GO SOIL REFLECTANCE SENSOR ABSTRACT

The Application of Classical Least Square Algorithm in the Quantitative Analysis of Lime in Wheat Flour by ATR-MIR Spectroscopy *

Extraction Process Validation of Isatis Radix

NIR applications for emerging vegetal contaminants

S is referred to as sour. The sulfide loading lends an unpleasant odor and makes the water corrosive. Additionally, the presence of H 2

Supporting Information

Nanosized metal organic framework of Fe-MIL-88NH 2 as a novel peroxidase mimic and used for colorimetric detection of glucose

Growth of silver nanocrystals on graphene by simultaneous reduction of graphene oxide and silver ions with a rapid and efficient one-step approach

Explaining Correlations by Plotting Orthogonal Contrasts

Science Unit Learning Summary

BATCH PROCESS MONITORING THROUGH THE INTEGRATION OF SPECTRAL AND PROCESS DATA. Julian Morris, Elaine Martin and David Stewart

Rapid Compositional Analysis of Feedstocks Using Near-Infrared Spectroscopy and Chemometrics

Verification of Pharmaceutical Raw Materials Using the Spectrum Two N FT-NIR Spectrometer

The Role of a Center of Excellence in Developing PAT Applications

2.01 INFRARED ANALYZER

HISTORICAL PLASTER COMPOSITION DETECTION USING REFLECTANCE SPECTROSCOPY

Anti-Inflammatory Isoquinoline with Bis-seco-aporphine Skeleton from Dactylicapnos scandens

Multiplexing immunoassay with SERS

Sunlight loss for femtosecond microstructured silicon with two impurity bands

Keywords: Multimode process monitoring, Joint probability, Weighted probabilistic PCA, Coefficient of variation.

MULTIVARIATE PATTERN RECOGNITION FOR CHEMOMETRICS. Richard Brereton

STANDARD OPERATING PROCEDURES

Chaos suppression of uncertain gyros in a given finite time

ECE 661: Homework 10 Fall 2014

Prediction of beef chemical composition by NIR Hyperspectral imaging

Research Article Honey Discrimination Using Visible and Near-Infrared Spectroscopy

INFRARED SPECTROSCOPY

Vector Space Models. wine_spectral.r

Process Analytical Technology Diagnosis, Optimization and Monitoring of Chemical Processes

Advanced Spectroscopy Laboratory

Detection of Soil Total Nitrogen by Vis-SWNIR Spectroscopy

Compact Hydrogen Peroxide Sensor for Sterilization Cycle Monitoring

Study on Acoustically Transparent Test Section of Aeroacoustic Wind Tunnel

Use of Geographic Information Systems and Remote Sensing in Integrated pest Management Programs

BIOLIGHT STUDIO IN ROUTINE UV/VIS SPECTROSCOPY

Discrimination of Dyed Cotton Fibers Based on UVvisible Microspectrophotometry and Multivariate Statistical Analysis

The near-infrared spectra and distribution of excited states of electrodeless discharge rubidium vapour lamps

LIST OF ABBREVIATIONS:

Transcription:

International Journal of Bioinformatics and Biomedical Engineering Vol. 2, No. 2, 2016, pp. 40-45 http://www.aiscience.org/journal/ijbbe ISSN: 2381-7399 (Print); ISSN: 2381-7402 (Online) MW-PCA-LDA Applied to Vis-NIR Discriminate Analysis of Transgenic Sugarcane Kaisheng Ma 1, Lijun Yao 1, Jiemei Chen 2, Tao Pan 1, * 1 Department of Optoelectronic Engineering, Jinan University, Guangzhou, China 2 Department of Biological Engineering, Jinan University, Guangzhou, China Abstract The moving-window waveband screening is applied to principal component and linear discriminate analyses (PCA-LDA). An integrated method (MW-PCA-LDA) for spectral pattern recognition is proposed, which is successfully employed for the non-destructive recognition of transgenic sugarcane leaves using visible (Vis) and near-infrared (NIR) diffuse reflectance spectroscopy. A Kennard Stone-algorithm-based process of calibration, prediction and validation in consideration of uniformity and representative was performed to produce objective models. A total of 487 samples of sugarcane leaves in the elongation stage were collected from a planted field. These samples were composed of 306 transgenic samples containing both Bacillus thuringiensis and Bialaphos resistance genes, and 181 non-transgenic samples. Based on the MW-PCA-LDA method, the optimal waveband was 768 nm to 788 nm, and the corresponding validation recognition rates of transgenic and non-transgenic samples achieved 96.2% and 96.3%, respectively. The results show that Vis-NIR spectroscopy combined with the MW-PCA-LDA method can be used for accurate recognition of transgenic sugarcane leaves and provides a quick and convenient means of screening transgenic sugarcane breeding for large-scale agricultural production. Keywords Visible-Near Infrared Spectroscopy, Transgenic Sugarcane Leaves, Spectral Discriminate Analysis, Moving-Window Waveband Screening, Principal Component and Linear Discriminate Analyses Received: February 3, 2016 / Accepted: February 24, 2016 / Published online: March 9, 2016 @ 2016 The Authors. Published by American Institute of Science. This Open Access article is under the CC BY-NC license. http://creativecommons.org/licenses/by-nc/4.0/ 1. Introduction China's sugar production ranks third in the world. Sugarcane is the main sugar crops, except sugar refining, sugarcane is also used for the production of paper and fuel ethanol. With the development of agricultural biotechnology, transgenic sugarcane breeding is increasingly receiving attention. This increasesthe yield, resistance and added value of sugarcane by transferring insect-resistant and herbicide-tolerant genes into thesugarcane [1]. The microorganism Bacillus thuringiensis (Bt), a Gram-positive, spore-forming soil bacterium, produces a crystalline parasporal body during sporulation, which shows biocidal activity against some invertebrate orders such as lepidopteran, dipteran, and coleopteran insects at the larval stage, as well as against nematodes. And Bialaphos resistance (Bar) gene, which was cloned from Streptomyces hygroscopicus, can code phosphinothricin acetyltransferase to avoid the damage that L-phosphinothricin from herbicide could do to the plants [2]. In transgenic sugarcane breeding, it is necessary to determine whether the exogenous gene is successfully expressed in asugarcane plant. Molecular biology detection technologies, such as polymerase chain reaction and enzyme-linked immunosorbent assay (ELISA), are mainly used for genetic screening [3]. These methods are complicated, require expertise and cannot meet the needs of large-scale production. It is therefore of applied value to develop a simple and rapid * Corresponding author E-mail address: tpan@jnu.edu.cn (Tao Pan)

41 Kaisheng Ma et al.: MW-PCA-LDA Applied to Vis-NIR Discriminate Analysis of Transgenic Sugarcane method of transgenic sugarcane breeding screening. Near-infrared (NIR) primarily reflects the absorption of overtones and combinations of vibrations of X H functional groups (such as C H, O H and N H). NIR spectroscopy can rapid non-destructive determine samples without chemical reagents, which has been proven to be a powerful analytical tool for use in agriculture [4-6], food [7], environment [8] and biological medicine [9]. Furthermore, NIR spectra capture absorption information from the protein molecules related to genetic variations and this is applied in the fields of genetic disorders [10]. Principal component analysis (PCA) is a maximum variance projection method, it follows that a variable with a large variance is more likely to be expressed in the modeling than a low-variance variable. The most important use of PCA is indeed to represent a multivariate data table as a low-dimensional plan by employing the first few principal components with maximum contribution values. The processed data are then the input to a linear discriminate analysis (LDA) model for spectral pattern recognition. The PCA-LDA method is successfully used in many applications of spectral discriminant analysis [11-13]. Differences in protein molecule structures between transgenic and non-transgenic sugarcane contain large numbers of X H functional groups that have significant NIR absorption. In our previous study [14], the PCA-LDA was applied to the non-destructive recognition of transgenic sugarcane based on the entire scanning region (400 2498 nm). The result indicated that non-destructive NIR spectroscopy had substantial potential as a pattern recognition tool for transgenic sugarcane breeding screening. For a complex analyte with multiple components, such as sugarcane leaves, the raw absorption spectra data matrix usually contains considerable interference noise data. To improve the horizontal and vertical qualities of spectral data, wavelength selection is necessary to eliminate noise and to extract information, respectively. In the quantitative analysis of the NIR spectrum, numerous experimental results [4-8] [10] [15-17] indicate that moving-window waveband screening mode combined with PLS method can effectively extract information, eliminate noise disturbances, and significantly improve spectral predictive capability. However, combining waveband selection to conduct spectral pattern recognition is still rare because of the complexity of the algorithm. In the present study, through integrated the moving-window waveband screening into PCA-LDA model, a novel discriminate analysis method was proposed, namely MW-PCA-LDA. The MW-PCA-LDA algorithm platform was established and applied to non-destructive Vis-NIR spectroscopic recognition of transgenic sugarcane leaves. 2. Materials and Methods 2.1. Experimental Materials, Instruments and Measurement Methods Transgenic sugarcane material: Transgenic sugarcane strains contained both Bacillus thuringiensis (Bt) and Bialaphos resistance (Bar) genes genetically modified from three types of sugarcane receptors, ROC 20 th, ROC 22 nd, and Yuetang no. 00-236, for a total of 306 samples. Non-transgenic sugarcane material: Non-transgenic sugarcane strains were of seven types, namely, ROC 1 st, ROC 2 nd, ROC 3 rd, ROC 4 th, ROC 20 th, ROC 22 nd, and Yuetang no. 00-236, for a total of 181 samples. Enzyme-linked immunosorbent assay (ELISA) was used to check the integrity of the copies of the genes introduced during the breeding phase, and the expression of the exogenous gene was guaranteed. The equipment used was an ELISA kit BT-Cry1Ab/1Ac (AGDIA, Inc., USA) and a micro plate reader imark (Bio-rad, Inc., USA). All sugarcane plants were field grown to the elongation stage. A total of 487 samples of sugarcane leaves were collected, which comprise the 306 transgenic samples (positive) with Bt and Bar genes and 181 non-transgenic samples (negative). At least one leaf was collected from each plant. The collected samples were cleaned and stored at room temperature for 2 h to equilibrate to the experimental environment before the collection of the Vis NIR diffuse reflectance spectra. The spectra were collected using an XDS Rapid Content grating spectrometer (FOSS, Denmark) equipped with a diffuse reflection accessory and a round sample cell. The scanning range spanned from 400 to 2498 nm with a 2 nm wavelength gap. Wavebands of 400-1100 and 1100 2498 nm were selected for silicon and lead sulphide detection, respectively. The samples were directly placed in the diffuse reflection accessory. Each sample was measured in triplicate and the mean of three measurements was used for modeling. The spectra were recorded at 25±1 C and 46%±1% relative humidity. 2.2. Sample Division, Calibration, Prediction and Validation Processes A framework of calibration, prediction, and validation based on uniformity and representativeness was performed to produce objective models [11]. Several samples were randomly selected as validation samples and were not subjected to the modeling optimization process [4-5] [10]. The remaining samples were used as modeling samples and were divided into calibration and prediction sets using the

International Journal of Bioinformatics and Biomedical Engineering Vol. 2, No. 2, 2016, pp. 40-45 42 Kennard Stone (K S) algorithm [18]. All models were established for calibration and prediction sets, and model parameters were optimized depending on the recognition rate of the prediction set. Finally, the selected optimal models were revalidated against the validation samples. The K S algorithm is a well-performed method for sample division in experiment planning. The goal of the K S algorithm is to select a maximally diverse subset from a large set of candidate samples. Thus, the subset can uniformly and sufficiently represent the entire sample space. The algorithm assumes that one can define a distance between two samples, which is low when the two samples are similar and high when the samples are dissimilar. In this study, the calibration samples were selected from all the modeling samples using the K S algorithm, and the remaining samples were used as prediction samples. For the spectral data variables (wavelength), (i) A (absorbance vector) of n samples with K A = ( A, A,, A ), i = 1,2,, n (1) ( ) ( ) ( ) ( ) 1 2 K The Euclidean distance is employed as the distance measure between any two samples i and j, which is defined as follows: D i, j K = ( A - A ) (2) µ = 1 ( i) ( j) 2 µ µ The K S algorithm works as follows: The modeling set is the candidate set. The first two selected samples are the two candidates with the maximal pair distance. All subsequent samples are iteratively selected until the number of selected samples reaches the desired number of calibration samples. In one iteration, the minimum distances of every candidate to the entire distance previously selected are calculated; the one with the largest minimum distance is selected. To ensure modeling representativeness and integrity, all calibration, prediction and validation sets mustcontain negative and positive samples. Thus, the negative and positive samples should be divided into these three sets. The specific procedure is as follows. First, 81 samples from 181 negative samples were randomly selected for validation. The remaining 100 samples were used as modeling samples. Fifty samples were further selected from modeling samples using the K S algorithm as calibration samples and the remaining as prediction samples. Second, 106 samples from 306 positive samples were randomly selected for validation. The remaining 200 samples were used as modeling samples. One hundred samples were further selected from the modeling samples using the K S algorithm as calibration samples and the remaining as prediction samples. Finally, the positive and negative samples used for validation were merged into one validation set (187 samples). Similarly, the positive and negative samples used for calibration and prediction were merged into calibration (150 samples) and prediction (150 samples) sets, respectively. Figure 1 shows the type and number of samples in the calibration, prediction and validation sets. Figure 1. Type and number of samples in the calibration, prediction, and validation sets. 2.3. Waveband Screening with a Moving Window Mode Consecutive spectral data for N adjacent wavelengths were designated as a window. The window was moved or the size of window was varied in a predetermined search region of the spectrum, and then each analytical waveband that corresponds to a window was determined for modeling. The position and length of the wavebands were considered and the search parameters were set as follows: (1) initial wavelength (I) and (2) number of wavelengths (N). The search range covered the entire scan region from 400 2498 nm with a 2 nm wavelength gap using 1050 wavelengths. I was set to I { 400,402,,2498}. To reduce the workload and ensure representativeness, N wassetto N {3,4,,100} {100,110,,200} {220, 240,,860} {1050}. 2.4. PCA LDA Method Typically, the three-dimensional (3D) PCA LDA model was well-performed [11-13].Thus, the 3D model was employed in the present study.the PCA LDA calibration and prediction modeling process is as follows: (1) Principal component analysis was performed based on a matrix of the absorbance spectra of the calibration set, and the loading and score matrices were obtained. (2) Based on the spectra of the calibration set, the first, second and third principal components (PC1, PC2 and PC3) with maximum contribution values were derived and normalized. A 3D principal component space (PC1, PC2 and PC3) was used to draw the results, which enabled the structure of the investigated dataset to be visualized. (3) Linear discriminant analysis was performed on 3D model based on the spectra of the calibration set. A linear cut-off planar was determined, which optimally classified transgenic and non-transgenic

43 Kaisheng Ma et al.: MW-PCA-LDA Applied to Vis-NIR Discriminate Analysis of Transgenic Sugarcane sugarcane leaf samples. (4) The score matrix of the prediction set was calculated on the basis of absorbance matrices of the prediction set and the acquired loading matrices of the calibration set. The first, second, and third components of prediction samples were derived and normalized. According to the cut-off planar, the genotypes of the prediction samples were determined. (5) The known genotypes of the prediction samples were used to calculate the prediction recognition rate. 2.5. MW PCA LDA Method PCA LDA method was combined with MW waveband selection mode to denote an integrated optimization method for spectral discriminant analysis as MW PCA LDA. PCA-LDA models were established on the waveband that corresponds to all combinations of parameters I and N, and then the corresponding P_RECs were calculated, as defined by the following equation: Where M V and M V+ are the numbers of negative and positive samples in the validation set, respectively, and Mɶ V and Mɶ V+ are the numbers of correctly recognized negative and positive samples in the validation set, respectively. 3. Results and Discussion Vis NIR diffuse reflection spectra of 306 transgenic (positive) and 181 non-transgenic (negative) samples of all 487 sugarcane leaf samples in the entire scanning region (400 2498 nm) are shown in Figures 2(a) and 2(b). The spectral features of Figures 2(a) and 2(b) were compared, and the spectra of negative and positive samples were over-lapping, thereby resulting in no obvious spectral differences for direct discriminant analysis. 1.6 1.4 is the number of correctly recognized prediction samples. The optimal wavebands for PCA LDA models were preferred according to the maximum P_REC to obtain the optimal models of MW PCA LDA method. Furthermore, with the use of the method mentioned in Section 2.4, the optimal model of MW PCA LDA method was validated using the validation samples excluded from the modeling optimization process. Based on the genuine genotypes of all validation samples, the validation recognition rate was calculated and denoted as V_REC, as defined in the following equation: Where M V (4) Absorbance /- ~ Where M P is the number of prediction samples, and M P ɶ M V_REC = V 100 % MV 1.2 (3) 1.0 0.8 0.6 0.4 0.2 (a) 0.0 400 700 1000 1300 1600 1900 is the number of correctly recognized validation samples. Moreover, the corresponding validation recognition rates of negative and positive samples were calculated and denoted as V_REC and V_REC+, respectively, as defined in the following equations: V_ REC = Mɶ V 100 % MV (5) V_ REC + = Mɶ V+ 100 % MV+ (6) 2500 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 is the number of validation samples, and Mɶ V 2200 Wavelength /nm Absorbance /- Mɶ P_REC = P 100% MP (b) 0.0 400 700 1000 1300 1600 1900 2200 2500 Wavelength /nm Figure 2. Vis NIR diffuse reflection spectra of sugarcane leaves for (a) 306 transgenic, (b) 181 non-transgenic. 3.1. Full PCA LDA Model For comparison, the full PCA LDA model that used the entire scanning region was first established using the method mentioned in Sections 2.4, and the corresponding modeling parameters and effects were summarized in Table 1. The results in Table 1 show that the effect of the full PCA LDA model was only 88.0%. The entire scan region (400 2498 nm)

International Journal of Bioinformatics and Biomedical Engineering Vol. 2, No. 2, 2016, pp. 40-45 44 contained numerous wavelengths (i.e., N=1050), which led to high noise disturbance and high modeling complexity. To better extract information, reduce model complexity, and improve spectral recognition, further waveband optimizations were performed by using MW PCA LDA method. Table 1. Modeling parameters and effects of full PCA LDA and MW PCA LDA models. Methods Waveband(nm) N PCC P_REC full PCA LDA 400-2498 1050 1, 2, 3 88.0% MW PCA LDA 768-788 11 1, 2, 3 97.3% Note: PCC: principal component combination. 3.2. Waveband Selection by MW PCA LDA The MW PCA LDA method mentioned in Section 2.5 was performed for waveband optimization. For the 3D model with PC1, PC2 and PC3, the selected I and N values were 768 nm and 11, respectively. The corresponding waveband was 768-788 nm, which covered part of the Vis NIR combined region. The optimal P_REC was 97.3%, which was evidently better than that obtained from the entire scanning region (Table 1). In addition, a small number of wavelengths were adopted in the selected model, thereby significantly reducing model complexity. The P_REC values of the local optimal model that corresponds to a single parameter I or Nwere shown in Figure 3. The results of Figure 3 revealed that the global optimal P_REC was achieved when I=768 nm and N=11. 3.3. Model Validation The randomly selected validation samples excluded in the modeling optimization process were used to validate the optimal MW PCA LDA model (I=768 nm, N=11). The validation recognition rates V_REC + and V_REC achieved 96.2% and 96.3%, respectively. As shown in Figure 4, the validation samples plot on the on the 3D principal component space (PC1, PC2 and PC3) were clearly classified into two groups. The result showed that the MW waveband selection applied to PCA LDA could effectively extract spectral information wavebands, eliminate noise disturbances and significantly improve spectral pattern recognition capability. Moreover, the number of adopted wavelengths and the model complexity could be largely reduced. The integrated MW PCA LDA method can be used for non-destructive discriminant analysis of transgenic sugarcane leaves with NIR spectroscopy. The obtained high recognition accuracy shows that NIR spectroscopy could capture related information of transgenic sugarcane leaves. The methodological framework may also provide valuable reference for non-destructive and rapid detection of other crops. Figure 4. Validation recognition of the optimal MW PCA LDAmodel in 3D principal componentspace. 4. Conclusion Figure 3. P_REC of local optimal model for (a) initial wavelength, (b) number of wavelengths. The moving-window waveband screening was applied to a PCA LDA model, which was successfully used for the non-destructive recognition of transgenic sugarcane leaves with Vis NIR spectroscopy. The K S algorithm-based process of calibration, predictionand validation in consideration of uniformity and representativeness was performed to produce objective models. The experimental results indicated that the recognition effects of the optimal PCA LDA model were evidentlyhigher than those obtained from the full PCA LDA model. More importantly, when the MW waveband screening was combined with PCA LDA method, a small number of wavelengths were adopted in the optimal model, thereby

45 Kaisheng Ma et al.: MW-PCA-LDA Applied to Vis-NIR Discriminate Analysis of Transgenic Sugarcane significantly eliminating noise disturbances and reducing model complexity. MW PCA LDA is an efficient information wavelength selection approach that can provide valuable references for designing a small dedicated spectrometer with a high signal-to-noise ratio. Vis NIR spectroscopy combined with the MW PCA LDA method provided a potential and promising tool for screening transgenic sugarcane breeding for large-scale agricultural production. We believe that MW PCA LDA has such applicability and can be applied to other fields of discriminant analysis. Acknowledgements This work was supported by the Science and Technology Project of Guangdong Province of China (No. 2014A020213016, No. 2014A020212445) and the Pearl River Nova Program of Guangzhou of China (No.2014J2200073). References [1] Zhang, X. Y., Yang, B. P., Zhang, S. Z. (2007) Research Progress on Transgenic Sugarcane. Molecular Plant Breedin, vol 5, pp. 155-159. [2] Arencibia A., Vazquez R. I., Prieto D., Te11ez P., Carmona E. R., Coego A., Hemandez L., Riva G. A., Selman-Housein G. (1997) Transgenic sugarcane plants resistantto stem borerattack. Molecular Breeding, vol 3, pp. 247-255. [3] Wang, G. Y., Fan, W. X., Chen, B. H., Zhag, J. W., Han, S. D. (2008) Application and Development of Detection Technology of Genetically Modified Foods (GMFs) I. Main Detection Technologies of GMFs and Their Characteristics. Food Science, vol 29, pp. 698-705. [4] Chen, H. Z., Pan, T., Chen, J. M., Lu, Q. P. (2011) Waveband selection for NIR spectroscopy analysis of soil organic matter based on SG smoothing and MWPLS methods. Chemometrics and Inteligentl Laboratory Systems, vol 107, pp. 139-146. [5] Pan, T., Li, M. M., Chen, J. M. (2014) Selection Method of Quasi-Continuous Wavelength Combination with Applications to the Near-Infrared Spectroscopic Analysis of Soil Organic Matter. Applied Spectroscopy, vol 68, pp. 263-271. [6] Pan, T., Wu, Z. T., Chen, H. Z. (2012) Waveband Optimization for Near-Infrared Spectroscopic Analysis of Total Nitrogen in Soil. Chinese Journal of Analytical Chemistry, vol 40, pp. 920-924. [7] Liu, Z. Y., Liu, B., Pan, T., Yang, J. D. (2013) Determination of amino acid nitrogen in tuber mustard using near-infrared spectroscopy with waveband selection stability. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, vol 102, pp.269-274. [8] Pan, T., Chen, Z. H., Chen, J. M., Liu, Z. Y. (2012) Near-Infrared Spectroscopy with Waveband Selection Stability for the Determination of COD in Sugar Refinery Wastewater. Analytical Methods, vol 4, pp.1046-1052. [9] Xie, J., Pan, T., Chen, J. M., Chen, Z. H. (2010) Joint Optimization of Savitzky-Golay Smoothing Models and Partial Least Squares Factors for Near-infrared Spectroscopic Analysis of Serum Glucose. Chinese Journal of Analytical Chemistry, vol 38, pp.342-346. [10] Pan, T., Liu, J. M., Chen, J. M., Zhang, G. P., Zhao, Y. (2013) Rapid determination of preliminary thalassaemia screening indicators based on near-infrared spectroscopy with wavelength selection stability. Analytical Methods, vol 5, pp. 4355-4362. [11] Eriksson, L., Johansson, E., Kettaneh, W. N., Trygg, J., Wikström, C., Wold, S. (2006) Multi- and Megavariate Data Analysis Part I: Basic Principles and Applications. Umetrics Academy, Umea Sweden. [12] Phil, W., Karl, N. (2001) Near-Infrared Technology in the Agricultural and Food Industries. American Association of Cereal Chemists, USA. [13] Lu, W. Z. (2007) Modern Near-infrared Spectroscopy Analytical Technology. China Petrochemical Press, Beijing. [14] Liu, G. S., Guo, H. S., Pan, T. (2014) Vis-NIR Spectroscopic Pattern Recognition Combined with SG Smoothing Applied to Breed Screening of Transgenic Sugarcane. Spectroscopy Spectral Analysis, vol 34, pp.2701-2706. [15] Long, X. L., Liu, G. S., Pan, T., Chen, J. M. (2014) Waveband selection of reagent-free determination for thalassemia screening indicators using Fourier transform infrared spectroscopy with attenuated total reflection. Journal of Biomedical Optics, vol 19, pp.087004-1-087004-11. [16] Chen, J. M., Pan, T., Liu, G. S., Han, Y., Chen, D. X. (2014) Selection of Stable Equivalent Wavebands for Near-Infrared Spectroscopic Analysis of Total Nitrogen in Soil. Journal of Innovative Optical Health Sciences, vol 7, pp.1-9. [17] Chen, J. M., Xiao, Q. Q., Pan, T., Yan, X., Wang, D. W., Yao, L. J. (2014) NIR Spectroscopy Combined with Stability and Equivalence MW-PLS Method Applied to Analysis of Hyperlipidemia Indexes. Spectroscopy Spectral Analysis, vol 34, pp. 2827-2832. [18] Kennard, R. W., Stone, L. A. (1969) Computer aided design of experiments. Technometrics, vol 11, pp. 137 149.