-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow template fetching #204
Conversation
This idea and implementation look great! I think it belongs with the chorizo. Thanks a lot! |
Awesome, thanks so much @diogomart! |
Here's a summary of the recent changes:
The linking residues' templates could have been generated, if we allow guessing certain linker types: @diogomart Do you want guessing the linkers to be part of chorizo creation? If so, I can add the functions. If not, we will leave them to users. The template preparation is not too difficult to do with a standalone script. Users can do some basic editing and assign linkers by patterns or atom names. And when needed, they can define new linking reactions. This is not urgent to merge. Thanks again for looking at this! Edit:
|
cc.link_labels vs atom_idx
assigns an ambiguous template_key
Yes! That would enable non-standard amino-acids, which is important. But no rush, we can release it v0.7 to have as much time as needed to implement it. I started testing this PR, I'll post here soon. Thanks for all this work! |
Hi @diogomart |
(just commenting for record) There are two types of structures guess linker can't handle:
Both might require pre-registration or a guess for a new padder. This feels a bit conflicting with the current mapping and padding process, where we explicitly restrict the type and position of padding in the templates. I'm not very sure what to do....... If we want to enable guessing padders - I can rework Padder and template matching in a new branch. The default padding will contain the 1-4 covalent (or as many) atoms in the adjacent residue. But this could be too forgiving as it trusts input structure unconditionally and reads whatever bonded component from the adjacent residue. There will be no more adjacent residue check and the guess will supercede the fallback reaction logics. Again no rush on this at all, and if there're more important things we should do at the moment, then this can wait (indefinitely until someone needs it) |
@rwxayheee yes please create a PR for |
I noticed that heme (HEM) does not build as an rdkit molecule because RDKit complains about a nitrogen having 4 bonds without a +1 formal charge. This happened with |
Yes, anything containing dative bonds of metals may not work. It happens with the Pt-nucleotide complexes too, but they were gated at an earlier point because the element isn't supported.
In the definition file of HEM, the charge column is all 0. This is incorrect because before deprotonation, the charge of the molecule should be +2.. This is failing like a few other ligands, when definition is incorrect. Maybe we can put HEM in the default template, so the fetching never needs to happen |
That makes a lot of sense |
Closing as it is superseded by #206 |
This PR includes the utilities for residue template generation. **Currently it works for noncovalent cofactors only. **
Here's a summary of the changes:
Insert a check for unknown residues in the
__init__
ofLinkedRDKitChorizo
.If
raw_input_mols
orset_templates
contain residues not in the current loaded residue template, a new residue template will be made.Fetch from rcsb
The
input_resname
or specified resname will be used to fetch a definition CIF file from RCSB (https://files.rcsb.org/ligands/download/). The CIF files will be downloaded as temporary files.Build process
The chemical component will be processed into the canonical protonation state (after deprotonating a default set of acidic protons). No further embedding, extension, capping, linker guessing will be attempted. The logging level of the build process is controlled by
ChemicalComponent_LoggingControler
. Other associated functions are imported fromchemtempgen.py
.Added dependencies
gemmi, copy, urllib.request (and network connectivity), time, tempfile
Example output
@diogomart Could you please take a quick look, and let me know if you like the template generation to be part of chorizo creating? If so, I have some more improvements to make and will continue working on this. If not, I will open another PR with just the check only. I can write some examples of how to make and edit chemical templates with the functions.
Thank you for your time and kind advice in advance