The classification of protein structures based on the sequential and structural similarity,
and the database of representative protein chains (PDB-REPRDB)

Tamotsu Noguchi, Yutaka Akiyama, Kentaro Onizuka and Makoto Ando

The Protein Data Bank (PDB) is a rich library of atomic-coordinate data of biological macromolecules. The PDB entries have been increasing rapidly by the improvement of X-ray crystallography and NMR experimental techniques, and the number of current entries is more than 7,500 (3.4Gbytes), though not all entries are competent for the purpose of computational protein structure analysis.
A lot of entries have insufficiently-refined coordinate data, or have some or many similar entries in terms of structural or sequential similarity. Thus the need for a classification procedure of protein sturcures has become quit obvious.
We have proposed a representative chain database PDB-REPRDB, whose startegy of selection is based on the sequential and structural similarity.

In this paper, we have developed a representative chain database PDB-REPRDB, and we report the MPI- parallelization of our automatic construction system for PDB-REPRDB. %Performance evaluation on three parallel computers is also reported.
Now that a calculation of a representative set can be done within 1.5 hours rather than 1 week, with 110-folds speed-up achieved in this study. We have opened a WWW service for the PDB-REPRDB, which have been accessed more than 2100 times.


Real World Computing Partnership