Jing Ye, Wei Chen* and Dianchuan Jin Pages 684 - 689 ( 6 )
Background: Heat shock proteins (HSPs) ubiquitously expressed in both prokaryotes and eukaryotes. According to their molecular mass and function, HSPs are classified into different families which are structurally different and play distinct functions in biological processes. Although some efforts have been made for identifying the types of HSPs, there is no method available that can be used to identify the types of HSPs in plants.
Method: The amino acid distributions in the different types of HSPs are anazlyed. HSPs are encoded using the reduced amino acid alphabet (RAAA). By comparing the predictive capability of models based on the composition of RAAA with different sizes, the optimal feature vector was obtained. A support vector machine based model was developed to identify the types of HSPs by using the optimal feature vector.
Results: The amino acid distributions are different among the different families of HSPs. In the rigorous jackknife test, the proposed method obtained an accuracy of 93.65% for identifying the five families of HSPs in plant.
Conclusions: We hope the proposed method will become a useful tool to identify the types of HSPs in plants.
Biological processes, heat shock protein, reduced amino acid, support vector machine, n-peptide, jackknife test.
School of Sciences, North China University of Science and Technology, Tangshan 063000, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063000, School of Sciences, North China University of Science and Technology, Tangshan 063000