Residue numbering start specification #74

lucajovine · 2024-11-19T07:56:24Z

This is not a bug but rather a request that, however, I am sure everybody will stand by: please add a way to specify in the input json file what is the number of the starting residue of each chain.

As users, we almost always have to do this "by hand" a posteriori, since AlphaFold numbering start defaults to 1 and - in the majority of cases - does not match the one we actually work with. This is a small change that could make things much easier for everybody, so I hope you'll take the suggestion into consideration. Thank you!

Augustin-Zidek · 2024-11-25T16:22:15Z

I am wondering whether this should be something that AlphaFold does -- I feel this goes against the UNIX philosophy of doing one thing and doing it well. This is clearly something that is a post-processing step that should be done on the produced mmCIF file. If we were to introduce it in AlphaFold, it would require modifying the input format (to specify the starting residue ID for each chain), which seems too invasive.

That being said, I think I will add a utility method in the Structure module to make it easy to write a post-processing script to do this.

I will leave the issue open until I implement it, then I will comment here with a Python snippet to achieve what you are asking for.

jkosinski · 2024-11-25T17:05:42Z

@lucajovine residue numbering is handled in our AlphaPulldown interface to AlphaFold2 (https://github.com/KosinskiLab/AlphaPulldown). You calculate input features for full-length proteins and then you run predictions for any subsets of residues preserving original full-length residue numbers. When/if we add AlphaFold3 backend, the same functionality will be supported.

lucajovine · 2024-11-25T18:46:28Z

@jkosinski thank you for mentioning this, but frankly this was more of a general comment than something specifically aimed at my lab (where we already have our own post-processing scripts for doing this).

@Augustin-Zidek may I respectfully disagree? I get the UNIX philosophy standpoint, but I do not see why using the correct numbering would go against that — in fact, rather the opposite. Just consider all the secreted proteins that have an N-terminal signal peptide: for any biologically meaningful prediction, we normally do not include those residues (which, in real life, are essentially never "seen" by the rest of the protein), so all resulting mature protein predictions end up being misnumbered (compared to the numbering that one finds in UniProt, for example). And the same of course happens if one wants to predict the structure of an engineered construct, which may have tags or the like. In all these cases, enforcing numbering from 1 is biologically meaningless. Moreover, I do not quite see how the introduction of the option that I was envisaging would be so intrusive, considering that it would just add one (optional) line per sequence in the input JSON.

One may of course argue that if one is able to install and run AlphaFold on their own machine, surely they can easily work out a way to renumber residues if needed. And this is indeed the case (as I mentioned above). But for a lot of the biologists that make predictions using the server this is simply not trivial (as I often get asked about this issue), so my main reason for bringing this up was just to try making everyone's life easier...

Augustin-Zidek added the enhancement New feature or request label Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Residue numbering start specification #74

Residue numbering start specification #74

lucajovine commented Nov 19, 2024

Augustin-Zidek commented Nov 25, 2024

jkosinski commented Nov 25, 2024

lucajovine commented Nov 25, 2024 •

edited

Loading

Residue numbering start specification #74

Residue numbering start specification #74

Comments

lucajovine commented Nov 19, 2024

Augustin-Zidek commented Nov 25, 2024

jkosinski commented Nov 25, 2024

lucajovine commented Nov 25, 2024 • edited Loading

lucajovine commented Nov 25, 2024 •

edited

Loading