Protein Structure Project:TreeThreader | CAS@home
log in
Research Area: Protein Structure
Institute: Institute of Computing Technology, CAS
Application: TreeThreader

Introduction

Understanding proteins' structures and interactions is needed for understanding their mechanisms and hence essential for a complete understanding of life processes at the molecular level. Currently, over seven million protein cases achieve the accuracy of medium-resolution NMR or sequences are deposited in the UniProtKB/TrEMBL database but only 50000 of them have experiment-tally solved structures. The high demand of the community for protein structures has placed computer-based protein structure prediction, the only means to alleviate the problem, at an unprecedentedly crucial position.

However, protein structure prediction needs tremendous computing-time. For instance, threading, the leading methods for protein structure prediction, is exceedingly time-consuming because the query sequence should be aligned to all template in the database. Volunteer computing is absolutely a great chance for protein structure prediction.

Our goal is to develop a new practical threading program, which can take pairwise interaction into consideration. It has been proved that the general case(all pairwise contacts are considered) of the problem is NP-hard. So, we turn to employ nested graph to describe parts of contacts of template (just like covariance model for RNA secondary structure analysis), which can be inferred computationally effective.

Given a template T and a query sequence S, the framework of the program is as follows:

  • To represent the template by several nested graphs.
  • Here, we use an iterative lgorithm to solve this, in each round, use dynamic programming to build the optimal nested graph and remove all contacts it contains from the original contact graph.

  • Align each nested graph to the query sequence.
  • We use CRF(conditional random fields) to model this problem. CRF is a probabilistic model and features can be easily added.

  • Merge the alignments together.
  • Each nested graph would yield an alignment between template and query. We can combine them to a single alignment in the level of posterior probability matrix using probabilistic consistency technique, or build final model independently using MODELLER and then select a better one.

    Progress

    Thanks all of you! We have completed the first version of TreeThreader. Although the long-range contact information are not considered yet, this version has a comparable performance, comparing with the state of art methods, such as HHpred. The results on 200 medium cases are as follows.

    HHpred TreeThreader
    TMscore 0.29 0.37

    We will explore the long-range contacts further and take these information account into our next version of TreeThreader. In addition, our TreeThreader has participated in CASP10(Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction), one structure prediction contest in which most of state of art methods are tested. We will report our performance in CASP10 as soon as the official results of CASP10 are published.