Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mayer and Wiberg-Lowdin bond-indices missing from some subsets #76

Closed
IgnacioJPickering opened this issue Jul 20, 2023 · 10 comments · Fixed by #77
Closed

Mayer and Wiberg-Lowdin bond-indices missing from some subsets #76

IgnacioJPickering opened this issue Jul 20, 2023 · 10 comments · Fixed by #77

Comments

@IgnacioJPickering
Copy link

IgnacioJPickering commented Jul 20, 2023

After parsing the dataset I found that some or all of the Wiberg-Lowdin and Mayer indices are missing for some subsets, specifically for:

  • PubChem Set 1
  • PubChem Set 2
  • PubChem Set 3
  • PubChem Set 4
  • PubChem Set 5
  • DES370K Supplement
  • Ion-Pairs

MBIS seems to be missing from DES370K Supplement and Ion-Pairs too, but from issue #48 I gather that this is to be expected since most conformations could not converge MBIS in those subsets.

I wanted to double check that it is indeed intended that the bond indices are missing from these subsets, and if so what is the reason for this (I found it strange that they are present in PubChem Set 6 but not in the rest).

I haven't checked if the bond-indices are missing for all conformations or just some of them.

(This is mostly to double-check that I'm parsing the datasets correctly, I don't really have a use for the bond-indices currently)

@IgnacioJPickering
Copy link
Author

IgnacioJPickering commented Jul 20, 2023

https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2022-06-08-QMDataset-ion-pairs#metadata

this link seems to imply that the bond indices should be available for ion-pairs

@peastman
Copy link
Member

No idea what's up with that. It seems to be by molecule, not subset. I just ran a count of how many molecules in each subset do or don't have Wiberg bond orders.

Subset Have Don't Have
SPICE DES Monomers Single Points Dataset v1.1 374 0
SPICE DES370K Single Points Dataset Supplement v1.0 6 87
SPICE DES370K Single Points Dataset v1.0 3397 0
SPICE Dipeptides Single Points Dataset v1.2 567 110
SPICE Ion Pairs Single Points Dataset v1.1 12 16
SPICE PubChem Set 1 Single Points Dataset v1.2 453 1919
SPICE PubChem Set 2 Single Points Dataset v1.2 411 2020
SPICE PubChem Set 3 Single Points Dataset v1.2 1447 999
SPICE PubChem Set 4 Single Points Dataset v1.2 568 1887
SPICE PubChem Set 5 Single Points Dataset v1.2 434 2029
SPICE PubChem Set 6 Single Points Dataset v1.2 2476 0
SPICE Solvated Amino Acids Single Points Dataset v1.1 26 0

There are a few subsets for which every molecule has bond orders, but in most cases some molecules do and some don't.

I queried the ion pairs dataset from QCArchive to see whether the data is missing there, or where it's a problem in the downloader script. For about half the records, no only are the Wiberg bond orders missing, but the whole extras section of the record is completely empty.

@pavankum any idea what's going on?

@pavankum
Copy link
Collaborator

I tried to dig into it but I am getting None when I try to retrieve records, might be something to do with the server migration I will ping @bennybp on slack.

@IgnacioJPickering
Copy link
Author

@peastman Thanks for the response, I suppose I'm parsing the data correctly then, I just missed the issue in Dipeptides for some reason. I downloaded the dataset from Zenodo FWIW, I did not use the downloader script.

@pavankum
Copy link
Collaborator

@peastman : @bennybp helped me with the debug, data for the key "WIBERG_LOWDIN_INDICES" is populated for all the completed calculations, and data for a redundant key with spaces "WIBERG LOWDIN INDICES" is not present in all. I checked on the Ion Pairs dataset and I could see 1426 records with Wiberg indices if I used the right key and 1389 with the second one with spaces.

I checked another small dataset, DES370K supplement, and I see 3631/3631 with the right key and 2004/3631 with the second one with spaces.

On a side note, I got a conda env for accessing the legacy server from Ben, I was getting None before with 0.15.6

name: qcportal_legacy
channels:
  - conda-forge
  - defaults
dependencies:
  - qcportal=0.15.8
  - msgpack-python=1.0.2=py39hff7bd54_1
  - pandas=1.3.5=py39h8c16a72_0
  - pydantic=1.9.0=py39h7f8727e_0
  - python=3.9.7=h12debd9_1
  - qcelemental=0.24.0=pyhd8ed1ab_0
  - nglview

@peastman
Copy link
Member

Can you show how you're accessing it? I retrieve the records from the dataset with ds.get_records(). Then I look up the data from them with [recs.iloc[i].record.dict()['extras'] for i in range(len(recs))]. For about half the records in the ion pairs dataset, it's empty.

@pavankum
Copy link
Collaborator

pavankum commented Jul 22, 2023

I think I was doing almost the same

import qcportal as qcp

client = qcp.FractalClient()
ds = client.get_collection('Dataset', 'SPICE Ion Pairs Single Points Dataset v1.1')
for row in ds.list_records().iloc:
        spec = row.to_dict()
        if spec['method'] == 'wb97m-d3bj':
            recs = ds.get_records(method=spec['method'], basis=spec['basis'], program=spec['program'], keywords=spec['keywords'])
            break
for r in recs.iterrows():
    print(r[1].record.extras)
    break

@peastman
Copy link
Member

Here's what I do:

from qcportal import FractalClient
fc = FractalClient()
ds = fc.get_collection('Dataset', 'SPICE Ion Pairs Single Points Dataset v1.1')
spec = ds.list_records().iloc[0].to_dict()
recs = ds.get_records(method=spec['method'], basis=spec['basis'], program=spec['program'], keywords=spec['keywords'])
print([recs.iloc[i].record.extras.keys() for i in range(len(recs))])

For about half the records there are two keys: dict_keys(['_qcfractal_tags', 'qcvars']). And for the other half it's empty: dict_keys([]).

@pavankum
Copy link
Collaborator

your call is accessing a different spec

spec = ds.list_records().iloc[0].to_dict()

output:
{'driver': 'gradient',
 'program': 'psi4',
 'method': 'b3lyp',
 'basis': 'dzvp',
 'keywords': 'openff-default',
 'name': 'B3LYP/dzvp-openff-default'}

@peastman
Copy link
Member

peastman commented Aug 7, 2023

The updated file is now available on Zenodo. Thanks for reporting this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants