Ningyi Zhang, Ying Zhang, Tianyi Zhao, Jun Ren, Yangmei Cheng and Yang Hu* Pages 690 - 695 ( 6 )
Background: MicroRNAs (miRNAs) are a set of non-coding, short (approximately 21nt) RNAs that play an important role as a regulator in biological processes in the cells. The identification and discovery of pre-miRNAs are beneficial in understanding the regulatory process, the functions of miRNAs and other genes, and furthermore in biological evolution.
Methods: Machine learning method has been a powerful technology in distinguishing the real premiRNAs from other hairpin-like sequences (pseudo pre-miRNAs). However, most of the commonly used classifiers are not promising in predicting performances on independent testing data sets. To overcome this, we proposed a novel BRAda algorithm integrating BP neural network and random forest classifier based on two balanced training sets. By distributing weights to these classifiers and the proposed 98-dimensional features, we obtained a strong classifier with high-accuracy and stability. Furthermore, based on the novel classifier we proposed, two independent testing sets (undated human and non-human pre-miRNAs) were employed to evaluate the prediction performance.
Results: The novel method BRAda algorithm is significantly outperformed the other methods in identifying both human and non-human pre-miRNAs.
Conclusion: The novel algorithm integrated BP neural network and random forest classifier based on two balanced training sets. Compared with other state-of-art machine-learning methods, the performance of BRAda was perfect (the ACC is over 99%) according to the validation. Besides, though the algorithm was trained by human gene sets, the prediction performance on non-human testing sets was also excellent (the average ACC is over 97%), which means the method not only has high stability but also robustness. By experiments and validation, the authors showed the method is an effective tool for pre-miRNA identification.
Biological process, BRAda, BP neural network, genes, Pre-miRNA identification, random forest.
Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin 150088, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, The First People's Hospital of Anqing Ultrasound Department 42 Xiaosu Road, Anqing City, Anhui Province, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001