Skip to content

3.22 cif2pdb - convert mmCIF to PDB if a given tool takes only PDBs (smart way)

Latest
Compare
Choose a tag to compare
@mmagnus mmagnus released this 26 Sep 02:57
· 23 commits to master since this release

There are many issues when working with PDB files these days, largely due to the increasing complexity of biological structures. One of the first challenges is the number of chains in a structure, which can be very large.

To accommodate this, double-letter chain identifiers were introduced at some point for the mmCIF format. However, the PDB format can only have one char chain name.
foo
(plus this auth-ors naming… just to make it even more complicated…, for the code here I use auth values, A5 here)

The second problem is that the number of atoms for one structure is so huge that it doesn't fit the character limit for the PDB format. If I put all the chains (even after renaming to single-code the number of atoms is crashing the format, some parsers might be OK, but you can also see that XYZ is off, etc).

sc 2024-09-25 at 22 32 00

MY SOLUTION

Install rna-tools

$ pip install --upgrade rna-tools

For now, the solution in rna-tools is to parse the CIF file save each RNA chain into a separate file, and set the chain name to a capital letter.

$ rna_pdb_tools.py --cif2pdb input/4v6x.cif # or a separate tool `rna_cif2pdb.py 4v6x.cif`

Warning: some of the chains in this mmCIF file have chain names with more chars than 1, e.g. AB, and the PDB format needs single-letter code, e.g. A.
rna chain B2 -> A # of atoms: 38377 4v6x_B2_nA_fCIF.pdb
rna chain BC -> B # of atoms: 1604 4v6x_BC_nB_fCIF.pdb
rna chain A5 -> C # of atoms: 84946 4v6x_A5_nC_fCIF.pdb
rna chain A7 -> D # of atoms: 2578 4v6x_A7_nD_fCIF.pdb
rna chain A8 -> E # of atoms: 3334 4v6x_A8_nE_fCIF.pdb

for each RNA chain, a new file is created in the PDB format:

4v6x_B2_nA_fCIF.pdb# auth B2 chain is renamed to (new chain) A and saved into this file.
[actually, maybe it would be more convenient if this chain was always simply ‚A’, I can change that easily, so the file would be 4v6x_B2_fCIF.pdb, and you know that the chain inside is simply A].

There is no single chain in the ribosome to exceed the atom limit, so we should be fine, Ninh let me know if the tools crash at any of the structures.

In PYMOL you can load all the files at once to see them as if there were one file.

sc 2024-09-25 at 22 05 48