随着蛋白质结构信息的不断积累,以及结构基因组学不断的发展,越来越多的功能未知但结构已知的蛋白质提交到了国际大分子数据库中(PDB数据库),这些蛋白质的功能及其功能位点需要注释。而随着实验生物学的不断发展,以前一些已知功能的蛋白质的功能及其功能位点可能需要重新注释。因此,发展一种精确,快速的可用于大规模功能注释的算法是结构生物信息学的重要研究内容之一。尽管已有许多算法用来对蛋白质结构或序列进行功能注释,但这些算法的精确性,敏感度等需要更进一步的提高。
最近,黄京飞课题组的李功华博士生在导师的指导下,开发出了一个新的预测蛋白质功能位点的算法(CMASA)。这个算法相对于其它已知的算法具有更高的精确性和敏感性,而且具有计算速度快的特点。利用CMASA,黄京飞课题组成员对PDB数据库中的酶进行了催化位点的注释并发现了166个新的未被注释的酶的催化位点。
推荐原文出处:
BMC Bioinformatics 2010, 11:439doi:10.1186/1471-2105-11-439
CMASA: an accurate algorithm for detecting local protein structural similarity and its application to enzyme catalytic site annotation
Gong-Hua Li and Jing-Fei Huang
Background
The rapid development of structural genomics has resulted in many "unknown function" proteins being deposited in Protein Data Bank (PDB), thus, the functional prediction of these proteins has become a challenge for structural bioinformatics. Several sequence-based and structure-based methods have been developed to predict protein function, but these methods need to be improved further, such as, enhancing the accuracy, sensitivity, and the computational speed. Here, an accurate algorithm, the CMASA (Contact MAtrix based local Structural Alignment algorithm), has been developed to predict unknown functions of proteins based on the local protein structural similarity. This algorithm has been evaluated by building a test set including 164 enzyme families, and also been compared to other methods.
Results
The evaluation of CMASA shows that the CMASA is highly accurate (0.96), sensitive (0.86), and fast enough to be used in the large-scale functional annotation. Comparing to both sequence-based and global structure-based methods, not only the CMASA can find remote homologous proteins, but also can find the active site convergence. Comparing to other local structure comparison-based methods, the CMASA can obtain the better performance than both FFF (a method using geometry to predict protein function) and SPASM (a local structure alignment method); and the CMASA is more sensitive than PINTS and is more accurate than JESS (both are local structure alignment methods). The CMASA was applied to annotate the enzyme catalytic sites of the non-redundant PDB, and at least 166 putative catalytic sites have been suggested, these sites can not be observed by the Catalytic Site Atlas (CSA).
Conclusions
The CMASA is an accurate algorithm for detecting local protein structural similarity, and it holds several advantages in predicting enzyme active sites. The CMASA can be used in large-scale enzyme active site annotation. The CMASA can be available by the mail-based server