Parallelization of the automatic determination system
for representative protein chains of the Protein Data Bank (PDB)

Tamotsu Noguchi(1), Yutaka Akiyama(1), Kentaro Onizuka(1),
Minoru Saito(1), Makoto Ando(1), Yoshihisa Shizawa(2)

The Protein Data Bank (PDB) is a rich library of atomic-coordinate data of biological macromolecules.
The PDB entries has been increasing rapidly by the improvement of X-ray crystallography and NMR experimental techniques, and the number of current entries is more than 5,800 (2.4Gbytes), though not all entries are competent for the purpose of computational protein structure analysis. A lot of entries have insufficiently-refined coordinate data.

Thus we have developed a representative chain database PDB-REPRDB, and in this paper we report the MPI- parallelization of our automatic construction system for PDB-REPRDB.
Performance evaluation on three parallel computers is also reported. Now that a calculation of a representative set can be done within 2 days rather than 2 weeks, with 10-folds speed-up achieved in this study.


(1) Real World Computing Partnership (2) Information and Mathematical Science Laboratory, Inc.