Zhao-Chun Xu*, Xuan Xiao*, Wang-Ren Qiu, Peng Wang and Xin-Zhu Fang Pages 1 - 9 ( 9 )
Background: As an important post-transcriptional modification, adenosine-to-inosine RNA editing generally occurs in both coding and noncoding RNA transcripts in which adenosines are converted to inosines. Accordingly, the diversification of the transcriptome can be resulted in by this modification. It is significant to accurately identify adenosine-to-inosine editing sites for further understanding their biological functions. Currently, the adenosine-to-inosine editing sites would be determined by experimental methods, unfortunately, it may be costly and time consuming. Furthermore, there are only a few existing computational prediction models in this field. Therefore, the work in this study is starting to develop other computational methods to address these problems. Methods: Given an uncharacterized RNA sequence that contains many adenosine resides, can we identify which one of them can be converted to inosine, and which one cannot? To deal with this problem, a novel predictor called iAI-DSAE is proposed in the current study. In fact, there are two key issues to address: one is ‘what feature extraction methods should be adopted to formulate the given sample sequence?’ The other is ‘what classification algorithms should be used to construct the classification model?’ For the former, a 540-dimensional feature vector is extracted to formulate the sample sequence by dinucleotide-based auto-cross covariance, pseudo dinucleotide composition, and nucleotide density methods. For the latter, we use the present more popular method i.e. deep spare auto-encoder to construct the classification model. Results: Generally, ACC and MCC are considered as the two of the most important performance indicators of a predictor. In this study, in comparison with those of predictor PAI, they are up 2.46% and 4.14%, respectively. The two other indicators, Sn and Sp, rise at certain degree also. This indicates that our predictor can be as an important complementary tool to identify adenosine-to-inosine RNA editing sites. For the convenience of most experimental scientists, an easy-to-use web-server for identifying adenosine-to-inosine editing sites has been established at: http://www.jci-bioinfo.cn/iAI-DSAE, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. Conclusion: It is important to identify adenosine-to-inosine editing sites in RNA sequences for the intensive study on RNA function and the development of new medicine. In current study, a novel predictor, called iAI-DSAE, was proposed by using three feature extraction methods including dinucleotidebased auto-cross covariance, pseudo dinucleotide composition and nucleotide density. The jackknife test results of the iAI-DSAE predictor based on deep spare auto-encoder model show that our predictor is more stable and reliable. It has not escaped our notice that the methods proposed in the current paper can be used to solve many other problems in genome analysis.
Adenosine-to-inosine RNA editing, Dinucleotide-based auto-cross covariance, Pseudo dinucleotide composition, Nucleotide density, Deep spare auto-encoder
Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403, Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403, Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403, Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403, Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403