Improving the quality of twilight-zone alignments

LUKASZ JAROSZEWSKI; LESZEK RYCHLEWSKI; ADAM GODZIK

doi:10.1110/ps.9.8.1487

Abstract

Several recent publications illustrated advantages of using sequence profiles in recognizing distant homologies between proteins. At the same time, the practical usefulness of distant homology recognition depends not only on the sensitivity of the algorithm, but also on the quality of the alignment between a prediction target and the template from the database of known proteins. Here, we study this question for several supersensitive protein algorithms that were previously compared in their recognition sensitivity (Rychlewski et al., 2000). A database of protein pairs with similar structures, but low sequence similarity is used to rate the alignments obtained with several different methods, which included sequence–sequence, sequence–profile, and profile–profile alignment methods. We show that incorporation of evolutionary information encoded in sequence profiles into alignment calculation methods significantly increases the alignment accuracy, bringing them closer to the alignments obtained from structure comparison.

In general, alignment quality is correlated with recognition and alignment score significance. For every alignment method, alignments with statistically significant scores correlate with both correct structural templates and good quality alignments. At the same time, average alignment lengths differ in various methods, making the comparison between them difficult. For instance, the alignments obtained by FFAS, the profile–profile alignment algorithm developed in our group are always longer that the alignments obtained with the PSI-BLAST algorithms. To address this problem, we develop methods to truncate or extend alignments to cover a specified percentage of protein lengths. In most cases, the elongation of the alignment by profile–profile methods is reasonable, adding fragments of similar structure. The examples of erroneous alignment are examined and it is shown that they can be identified based on the model quality.

Crossref Citations

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Pawlowski, Krzysztof Rychlewski, Leszek Zhang, Baohong and Godzik, Adam 2001. Fold Predictions for Bacterial Genomes. Journal of Structural Biology, Vol. 134, Issue. 2-3, p. 219.

Mittl, Peer R.E and Grütter, Markus G 2001. Structural genomics: opportunities and challenges. Current Opinion in Chemical Biology, Vol. 5, Issue. 4, p. 402.

Nair, Rajesh and Rost, Burkhard 2002. Sequence conserved for subcellular localization. Protein Science, Vol. 11, Issue. 12, p. 2836.

Biohong Zhang and Godzik, A. 2002. The neaning and limitations of protein structure alignments. p. 729.

Constans, Pere 2002. Linear scaling approaches to quantum macromolecular similarity: Evaluating the similarity function. Journal of Computational Chemistry, Vol. 23, Issue. 14, p. 1305.

Lesley, Scott A. Kuhn, Peter Godzik, Adam Deacon, Ashley M. Mathews, Irimpan Kreusch, Andreas Spraggon, Glen Klock, Heath E. McMullan, Daniel Shin, Tanya Vincent, Juli Robb, Alyssa Brinen, Linda S. Miller, Mitchell D. McPhillips, Timothy M. Miller, Mark A. Scheibe, Daniel Canaves, Jaume M. Guda, Chittibabu Jaroszewski, Lukasz Selby, Thomas L. Elsliger, Marc-Andre Wooley, John Taylor, Susan S. Hodgson, Keith O. Wilson, Ian A. Schultz, Peter G. and Stevens, Raymond C. 2002. Structural genomics of the Thermotoga maritima proteome implemented in a high-throughput structure determination pipeline . Proceedings of the National Academy of Sciences, Vol. 99, Issue. 18, p. 11664.

Elofsson, Arne 2002. A study on protein sequence alignment quality. Proteins: Structure, Function, and Bioinformatics, Vol. 46, Issue. 3, p. 330.

Jaroszewski, Lukasz Li, Weizhong and Godzik, Adam 2002. In search for more accurate alignments in the twilight zone. Protein Science, Vol. 11, Issue. 7, p. 1702.

Tress, Michael L. Jones, David and Valencia, Alfonso 2003. Predicting Reliable Regions in Protein Alignments from Sequence Profiles. Journal of Molecular Biology, Vol. 330, Issue. 4, p. 705.

Fiser, András and Šali, Andrej 2003. Macromolecular Crystallography, Part D. Vol. 374, Issue. , p. 461.

Fiser, András and Sali, Andrej 2003. Protein Structure. p. 167.

Sadreyev, Ruslan I. Baker, David and Grishin, Nick V. 2003. Profile–profile comparisons by COMPASS predict intricate homologies between protein families. Protein Science, Vol. 12, Issue. 10, p. 2262.

Zhao, Yun Hong, Dong-Hyun Pawlyk, Basil Yue, Guohua Adamian, Michael Grynberg, Marcin Godzik, Adam and Li, Tiansen 2003. The retinitis pigmentosa GTPase regulator (RPGR)- interacting protein: Subserving RPGR function and participating in disk morphogenesis. Proceedings of the National Academy of Sciences, Vol. 100, Issue. 7, p. 3965.

Centeno, Nuria B. Villà‐Freixa, Jordi and Oliva, Baldomero 2003. Teaching structural bioinformatics at the undergraduate level. Biochemistry and Molecular Biology Education, Vol. 31, Issue. 6, p. 386.

Marti‐Renom, Marc A. Madhusudhan, M.S. Eswar, Narayanan Pieper, Ursula Shen, Min‐yi Sali, Andrej Fiser, Andras Mirkovic, Nebojsa John, Bino and Stuart, Ashley 2003. Modeling Protein Structure from its Sequence. Current Protocols in Bioinformatics, Vol. 3, Issue. 1,

Wallner, Bj�rn Fang, Huisheng and Elofsson, Arne 2003. Automatic consensus-based fold recognition using Pcons, ProQ, and Pmodeller. Proteins: Structure, Function, and Genetics, Vol. 53, Issue. S6, p. 534.

Arndt, Michaela A.E. Krauss, Jürgen Schwarzenbacher, Robert Vu, Bang K. Greene, Shailen and Rybak, Susanna M. 2003. Generation of a highly stable, internalizing anti‐CD22 single‐chain Fv fragment for targeting non‐Hodgkin's lymphoma. International Journal of Cancer, Vol. 107, Issue. 5, p. 822.

Liu, Tong Rojas, Ana Ye, Yuzhen and Godzik, Adam 2003. Homology modeling provides insights into the binding mode of the PAAD/DAPIN/pyrin domain, a fourth member of the CARD/DD/DED domain family. Protein Science, Vol. 12, Issue. 9, p. 1872.

Prasad, Jahnavi C. Silberstein, Michael Camacho, Carlos J. and Vajda, Sandor 2003. Algorithms in Bioinformatics. Vol. 2812, Issue. , p. 389.

Reinhardt, Astrid and Eisenberg, David 2004. DPANN: Improved sequence to structure alignments following fold recognition. Proteins: Structure, Function, and Bioinformatics, Vol. 56, Issue. 3, p. 528.

Download full list

Article contents

Improving the quality of twilight-zone alignments

Abstract

Keywords

Access options

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Article contents

Improving the quality of twilight-zone alignments

Abstract

Keywords

Access options

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests