Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAStruct feature request: output axes info for bases and basepairs #1036

Open
tcbishop opened this issue Jul 8, 2023 · 8 comments
Open

NAStruct feature request: output axes info for bases and basepairs #1036

tcbishop opened this issue Jul 8, 2023 · 8 comments
Assignees
Labels
enhancement New keywords New keywords for existing commands.

Comments

@tcbishop
Copy link

tcbishop commented Jul 8, 2023

Maybe cpptraj can already do this... ?

I would like to have NAStruct local axis information for both bases and base pairs made available alongside the other pairing and step parameter information using cpptraj keywords.

The goal is to write out vector data in a pseudo-trajectory format that can be overlaid on the all atom trajectory for easy visualization.

In debug mode NAStruct can output axis information for bases (baseaxes.pdb line 316 ) and basepairs (basepairaxes.pdb line 1087) so this appears to be mostly an I/O issue.

I'm proposing something like the following
(using text adopted from El Hassan JMB 1995 Figure 1 for "base pair reference frame" and "base reference frame" and the
existing descriptors in section "35.11.54. nastruct" of Amber 22 manual)

Base pair Reference Frames: 1 frame for each base pair
[BpOrg]: x,y,z coordinates of the base pair axis origin .
[BpX-axis]: unit vector for base pair x-axis relative to BpOrg
[BpY-axis]: unit vector for base pair y-axis relative to BpOrg
[BpZ-axis]: unit vector for base pair y-axis relative to BPorg

Base Reference Frames: 1 frame for each base
[BOrg]: x,y,z coordinates of the base axis origin
[BX-axis]: unit vector for base x-axis relative to BOrg
[BY-axis]: unit vector for base y-axis relative to BOrg
[BZ-axis]: unit vector for base y-axis relative to BOrg

For comparisons

In 3DNA Base Pair axes information is reported in the "ref_frames.dat" file as follows
... 1 T-A ...
109.6665 103.8816 167.4595 # origin
0.7276 -0.6703 -0.1460 # x-axis
0.6559 0.6175 0.4341 # y-axis
-0.2008 -0.4117 0.8889 # z-axis
... 2 A-T ...
108.6866 101.8792 169.7685 # origin
0.9787 -0.2047 -0.0159 # x-axis
0.1891 0.8686 0.4581 # y-axis
-0.0799 -0.4513 0.8888 # z-axis

@drroe drroe self-assigned this Jul 10, 2023
@drroe drroe added enhancement New keywords New keywords for existing commands. labels Jul 10, 2023
@drroe
Copy link
Contributor

drroe commented Jul 10, 2023

The goal is to write out vector data in a pseudo-trajectory format that can be overlaid on the all atom trajectory for easy visualization.

This is a good idea, and as you point out the functionality is mostly there, just hidden behind #ifdefs. I envision it as having a similar format to the writedata vectraj command, which lets you write vector DataSets in a pseudo trajectory format (see help Formats writedata vectraj). So maybe keywords like:

nastruct NA axesout MyAxes.pdb axesfmt pdb \
  bpaxesout MyBpAxes.nc bpaxesfmt netcdf bpaxesparmout MyBpAxes.parm7 ...

How does that seem? I'll start working on this and try to get this implemented as soon as I can. Thanks for the suggestion.

@tcbishop
Copy link
Author

Thanks! I think you get the idea. see a private email w/ the "bigger picture" that is beyond the scope of this request.

I"m no expert w/r/to cpptray/pytraj... so I trust you know best way to implement and what you propose seems to do the trick. I presume the vectors when loaded in say VMD will be s.t. they align w/ the all atom structure.
If the parm7 file has bonds between the director frame "Origin Atom" and the individual X, Y, Z axes then that makes for proper connectivity.

FYI: when I generate PDBs with basepair director frames I use "CA" for the base pair "Center Atom" and "H1", "H2", "H3" for the "helical axis" atoms... This way VMD automatically connects the axis (unit vectors 1A from the CA) and does not connect the CA atoms b/c they are 3.4A apart. Of course other names work (e.g. CA, CC, CG, CT would preserve sequence info ) but a heavy atom and hydrogens seem to do the trick for making quick visuals. IDK what will happen for default viz if apply same idea to the bases rather than basepairs... CA1, CC1, CG1 CT1 for watson strand and CT2, CG2, CC2, CA2 for crick strands?

@drroe
Copy link
Contributor

drroe commented Jul 11, 2023

Should be addressed once #1037 is merged in.

I presume the vectors when loaded in say VMD will be s.t. they align w/ the all atom structure.

Yes, they will align on however the coordinates are when they get to nastruct (so e.g. if you've RMS fit the coordinates beforehand, the axes will align on the RMS-fit coordinates).

If the parm7 file has bonds between the director frame "Origin Atom" and the individual X, Y, Z axes then that makes for proper connectivity.

Yes - I usually like the mol2 format for this (since everything is in one file), but for large trajectories it's best to write the axes coordinates to a NetCDF or DCD trajectory and write the separate topology.

FYI: when I generate PDBs with basepair director frames I use "CA" for the base pair "Center Atom" and "H1", "H2", "H3" for the "helical axis" atoms... This way VMD automatically connects the axis (unit vectors 1A from the CA) and does not connect the CA atoms b/c they are 3.4A apart.

Very clever! I've kept the names used in the debug output by default (Orig, X, Y, Z) but have added some keywords that let you change this (e.g. axisnameo CA axisnamex H1 axisnamey H2 axisnamez H3).

@drroe
Copy link
Contributor

drroe commented Jul 11, 2023

This has been merged and the new functionality is in version 6.19.6.

@drroe
Copy link
Contributor

drroe commented Jul 19, 2023

@tcbishop I just wanted to follow up and see if this functionality is working for you.

@tcbishop
Copy link
Author

Finally getting back to this.

For pytraj: how does one enter the list of base pairs?

For cpptraj: There are two odd things to report and a usage issue when seeking to analyze a system that happens to have 358bp and using a list of specified pairs.

In terms of usage the "specified pairs" line is very long. Is there a shorthand that can be used and possibly coupled with the now deprecated byptype ( para & anti ) key word?

  1. w/r/to the following axes options... 
     [axesout <file> [axesoutarg <arg> ...] [axesparmout <file>]]
     [bpaxesout <file> [bpaxesoutarg <arg> ...] [bpaxesparmout <file>]]
     [stepaxesout <file> [stepaxesoutarg <arg> ...] [stepaxesparmout <file>]]
     [axisnameo <name>] [axisnamex <name>] [axisnamey <name>] [axisnamez <name>]
    

All seems to work as advertised FOR ONE FRAME.
The ORIGin for the base "axes" seems to be much closer to the ORIGin for the "bpaxes" than I expected.
SEE ATTACHED image... shouldn't the red and blue licorice be farther separated in the image on right?
This may just be my ignorance.

  1. It seems that the base pair step axes pairing calc is not fully compliant with the "specified pairs" option.
    The steps are not determined from the list of basepairs provided but from some other method.
    Is this a Feature or bug?

thus when a trajectory rather than just single frame is analyzed it seems the number of STEPS changes.
Specifically the command line

stepaxesout stepAxes.pdb stepaxesoutarg pdb stepaxesparmout stepAxes.parm7

causes the steps to be evaluated or re-evaluated s.t. the number of base pair steps changes and cpptraj dies.

cyt 207% cpptraj sys.parm7 < ../bin/dcd2rod.cpptraj > some.log2
Error: Number of base pair steps has changed from 354 to 358.
Error: Base pair step axes pseudo-topology is already set up for 354 bases.

w/out the stepaxesout calc line the analysis all seems to work
but even for one this on frame the number of steps is not compatible w/ the number of base pairs.

Careful inspection of the image (did it actually get attached?) reveals atom counts of
StepAxes = 1416 -> 1416/4 = 354 bp
BPAxes = 1432 -> 1432/4 = 358 bp
Axes = 2864 -> 2864/4 = 716 bases -> 358bp

The error when you run cpptraj w/ a trajectory is

cyt 190% cpptraj sys.parm7 < ../bin/dcd2rod.cpptraj > some.log2
Error: Number of base pair steps has changed from 354 to 358.
Error: Base pair step axes pseudo-topology is already set up for 354 bases.

@hainm
Copy link
Contributor

hainm commented Jul 25, 2023

For pytraj: how does one enter the list of base pairs?

I don't think pytraj supports the new feature yet.
But one can hack by adding more cpptraj option to pucker_method option here: https://github.com/Amber-MD/pytraj/blob/f3cbb051f71e15d3f2b4b0d3422d4a4bfcab68e0/pytraj/analysis/nucleic_acid_analysis.py#L31

pucker_method='altona [and some cpptraj options here]',. But I am not so sure about return type (if getting any error or not).

@drroe
Copy link
Contributor

drroe commented Jul 25, 2023

First, sorry that you're still encountering issues! I'll try to resolve them as soon as I can.

In terms of usage the "specified pairs" line is very long. Is there a shorthand that can be used and possibly coupled with the now deprecated byptype ( para & anti ) key word?

Maybe something where you can specify strands via ranges. So instead of:

pairs 1-16,2-15,3-14,4-13

You could do something like:

pairs [1-4][16-13]

Do you think that would be useful? If you have other suggestions please let me know.

The ORIGin for the base "axes" seems to be much closer to the ORIGin for the "bpaxes" than I expected. SEE ATTACHED image.

There was no attached image, but what you describe is not unexpected and is just a consequence of how the reference axes origins are placed on each individual base (see Olson et al. J. Mol. Biol. (2001) 313, 229-237).

It seems that the base pair step axes pairing calc is not fully compliant with the "specified pairs" option.
The steps are not determined from the list of basepairs provided but from some other method.
Is this a Feature or bug?

The steps should be determined by connectivity within strands. So if strand 1 is A-B and strand 2 is C-D and A is bonded to C and B is bonded to D or A is bonded to D and B is bonded to C then that will count as a step (AC/BD or AD BC respectively). I would need more details of your system (actual topology/coordinates and a description of which step is expected but missing) in order to debug your specific case.

causes the steps to be evaluated or re-evaluated s.t. the number of base pair steps changes and cpptraj dies.

Unfortunately, I'm not surprised you're hitting issues - the code is still fairly new and relatively untested. Best route here is for you to provide me with a small (10-20 frames tops) test trajectory (and corresponding topology) so I can reproduce the issue and try to address it. You can extract the part of the trajectory where the calculation dies using trajin <start> <stop>. Let me know if you need more details or if any of this seems unclear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New keywords New keywords for existing commands.
Projects
None yet
Development

No branches or pull requests

3 participants