Enhanced HACBLalign Method using Transitional Pattern Search and Pre-Trained Classification Model for Protein Remote Homology Detection and Fold Recognition

Download Article

DOI: 10.21522/TIJPH.2013.12.04.Art054

Authors : Gopinath Krishnaraj, Rajendran Gurusamy

Abstract:

One of the most important tasks to predict the structure of proteins is Protein Remote Homology Detection and Fold Recognition. To do this, a Hierarchical Attention-based Convolutional Neural Network with Bidirectional Long Short-Term Memory called the HACBLalign algorithm was proposed by the authors, which performs Multiple Sequence Alignments (MSAs), extracts features, and recognizes protein homologies. But, when the quantity of Protein Sequences (PSs) increases, the number of times the decision-making system runs also increases. To avoid this issue, this article proposes an Enhanced HACBLalign (EHACBLalign) method using Transitional Pattern Search (TPS) and pre-trained classification for Protein Remote Homology Detection and Fold Recognition. During the alignment stage, the intermediate sequences such as Hit Regions (HRs) are identified by the TPS. Then, the HRs are extended in middle layers and utilized as a query in all TPS iterations. Besides, the HACBLalign algorithm is applied in all intermediate layers for generating pairwise alignments. Moreover, each pairwise alignment between intermediate sequences is merged to get the final alignment. Further, various characteristics are obtained from the chosen alignment and learned by the pre-trained Convolutional Neural Network (CNN) with a softmax function for recognizing protein remote homologies precisely. This enhances the performance of the decision-making system for large-scale PS databases. Finally, the test outcomes exhibit that the EHACBLalign realizes a 94.6%, 94.1%, and 93.4% accuracy on SCOP 1.53, SCOP 1.67, and superfamily corpora, respectively in Protein Remote Homology Detection and Fold Recognition.

References:

[1].  Lv, Z., Ao, C., and Zou, Q., 2019, Protein function prediction: from traditional classifier to deep learning, Proteomics, 19(14), 1-5.

[2].  Jing, X., Dong, Q., Hong, D., and Lu, R., 2019, Amino acid encoding methods for protein sequences: A comprehensive review and assessment, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 17(6), 1918-1931.

[3].  Rajapaksa, S, Sumanaweera, D, Lesk, A, M, Allison, L, Stuckey, P, J, Garcia de la Banda, M, and Konagurthu, A, S., 2022, On the reliability and the limits of inference of amino acid sequence alignments, Bioinformatics, 38(Supplement_1), i255-i263.

[4].  Peyravi, F, Latif, A, and Moshtaghioun, S. M., 2019, Protein tertiary structure prediction using hidden Markov model based on lattice, Journal of Bioinformatics and Computational Biology, 17(02), 1-18.

[5].  Wilburn, G. W., and Eddy, S. R., 2020. Remote homology search with hidden Potts model, PLOS Computational Biology, 16(11), 1-22.

[6].  Chen, J., Guo, M., Wang, X., and Liu, B., 2018, A comprehensive review and comparison of different computational methods for protein remote homology detection, Briefings in Bioinformatics, 19(2), 231-244.

[7].  Li, C, C., and Liu, B., 2020, MotifCNN-fold: protein fold recognition based on fold-specific features extracted by motif-based convolutional neural networks, Briefings in Bioinformatics, 21(6), 2133-2141.

[8].  Wu, Z., Liao, Q., and Liu, B., 2020, A comprehensive review and evaluation of computational methods for identifying protein complexes from protein–protein interaction networks, Briefings in Bioinformatics, 21(5), 1531-1548.

[9].  Liu, B., Chen, J., Guo, M., and Wang, X., 2017, Protein remote homology detection and fold recognition based on sequence-order frequency matrix, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16(1), 292-300.

[10]. Guo, Y., Yan, K., Wu, H., and Liu, B., 2020, ReFold-MAP: Protein remote homology detection and fold recognition based on features extracted from profiles, Analytical Biochemistry, 611, 1-8.

[11]. Gopinath, K., and Rajendran, G., 2023, HACBLalign: A Hierarchical Attention-based deep learning for protein remote homology and fold identification, Journal of Theoretical and Applied Information Technology, 14(101), 5578 – 5588.

[12]. Makigaki, S, and Ishida, T., 2020, Sequence alignment using machine learning for accurate template-based protein structure prediction, Bioinformatics, 36(1), 104-111.

[13]. Zhang, C., Zheng, W., Mortuza, S. M., Li, Y., and Zhang, Y., 2020, Deep MSA: Constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins, Bioinformatics, 36(7), 2105-2112.

[14]. Senior, A. W., Evans, R., Jumper, J., Kirkpatrick, J., Sifre, L., Green, T., and Hassabis, D., 2020, Improved protein structure prediction using potentials from deep learning, Nature, 577(7792), 706-710.

[15]. Wu, F., and Xu, J., 2021, Deep template-based protein structure prediction, PLoS Computational Biology, 17(5), 1-18.

[16]. Wu, T., Guo, Z., Hou, J., and Cheng, J., 2021, DeepDist: Real-value inter-residue distance prediction with deep residual convolutional network, BMC Bioinformatics, 22(1), 1-17.

[17]. Hakala, K., Kaewphan, S., Bjorne, J., Mehryary, F., Moen, H., Tolvanen, M., and Ginter, F., 2022, Neural network and random forest models in protein function prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19(3), 1772-1781.

[18]. Liu, J., Wu, T., Guo, Z., Hou, J., and Cheng, J., 2022, Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14. Proteins: Structure, Function, and Bioinformatics, 90(1), 58-72.

[19]. Zhang, C., and Pyle, A, M., 2022, A unified approach to sequential and non-sequential structure alignment of proteins, RNAs and DNAs, Iscience, 25(10), 1-13.

[20]. Rangwala, H, and Karypis, G., 2005, Profile-based direct kernels for remote homology detection and fold recognition, Bioinformatics, 21(23), 4239-4247.

[21]. Håndstad, T., Hestnes, A. J., and Sætrom, P., 2007, Motif kernel generated by genetic programming improves remote homology and fold detection, BMC Bioinformatics, 8(1), 1-16.

[22]. Andreeva, A., Kulesha, E., Gough, J., and Murzin, A, G., 2020, The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures. Nucleic Acids Research, 48(D1), D376-D382.

[23]. Devlin, J., Chang, M. W., Lee, K., and Toutanova, K., 2018, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805.