Mayer and Wiberg-Lowdin bond-indices missing from some subsets #76

IgnacioJPickering · 2023-07-20T19:09:53Z

After parsing the dataset I found that some or all of the Wiberg-Lowdin and Mayer indices are missing for some subsets, specifically for:

PubChem Set 1
PubChem Set 2
PubChem Set 3
PubChem Set 4
PubChem Set 5
DES370K Supplement
Ion-Pairs

MBIS seems to be missing from DES370K Supplement and Ion-Pairs too, but from issue #48 I gather that this is to be expected since most conformations could not converge MBIS in those subsets.

I wanted to double check that it is indeed intended that the bond indices are missing from these subsets, and if so what is the reason for this (I found it strange that they are present in PubChem Set 6 but not in the rest).

I haven't checked if the bond-indices are missing for all conformations or just some of them.

(This is mostly to double-check that I'm parsing the datasets correctly, I don't really have a use for the bond-indices currently)

IgnacioJPickering · 2023-07-20T19:23:51Z

https://github.com/openforcefield/qca-dataset-submission/tree/master/submissions/2022-06-08-QMDataset-ion-pairs#metadata

this link seems to imply that the bond indices should be available for ion-pairs

peastman · 2023-07-21T21:53:54Z

No idea what's up with that. It seems to be by molecule, not subset. I just ran a count of how many molecules in each subset do or don't have Wiberg bond orders.

Subset	Have	Don't Have
SPICE DES Monomers Single Points Dataset v1.1	374	0
SPICE DES370K Single Points Dataset Supplement v1.0	6	87
SPICE DES370K Single Points Dataset v1.0	3397	0
SPICE Dipeptides Single Points Dataset v1.2	567	110
SPICE Ion Pairs Single Points Dataset v1.1	12	16
SPICE PubChem Set 1 Single Points Dataset v1.2	453	1919
SPICE PubChem Set 2 Single Points Dataset v1.2	411	2020
SPICE PubChem Set 3 Single Points Dataset v1.2	1447	999
SPICE PubChem Set 4 Single Points Dataset v1.2	568	1887
SPICE PubChem Set 5 Single Points Dataset v1.2	434	2029
SPICE PubChem Set 6 Single Points Dataset v1.2	2476	0
SPICE Solvated Amino Acids Single Points Dataset v1.1	26	0

There are a few subsets for which every molecule has bond orders, but in most cases some molecules do and some don't.

I queried the ion pairs dataset from QCArchive to see whether the data is missing there, or where it's a problem in the downloader script. For about half the records, no only are the Wiberg bond orders missing, but the whole extras section of the record is completely empty.

@pavankum any idea what's going on?

pavankum · 2023-07-21T22:42:55Z

I tried to dig into it but I am getting None when I try to retrieve records, might be something to do with the server migration I will ping @bennybp on slack.

IgnacioJPickering · 2023-07-21T23:17:30Z

@peastman Thanks for the response, I suppose I'm parsing the data correctly then, I just missed the issue in Dipeptides for some reason. I downloaded the dataset from Zenodo FWIW, I did not use the downloader script.

pavankum · 2023-07-21T23:58:13Z

@peastman : @bennybp helped me with the debug, data for the key "WIBERG_LOWDIN_INDICES" is populated for all the completed calculations, and data for a redundant key with spaces "WIBERG LOWDIN INDICES" is not present in all. I checked on the Ion Pairs dataset and I could see 1426 records with Wiberg indices if I used the right key and 1389 with the second one with spaces.

I checked another small dataset, DES370K supplement, and I see 3631/3631 with the right key and 2004/3631 with the second one with spaces.

On a side note, I got a conda env for accessing the legacy server from Ben, I was getting None before with 0.15.6

name: qcportal_legacy
channels:
  - conda-forge
  - defaults
dependencies:
  - qcportal=0.15.8
  - msgpack-python=1.0.2=py39hff7bd54_1
  - pandas=1.3.5=py39h8c16a72_0
  - pydantic=1.9.0=py39h7f8727e_0
  - python=3.9.7=h12debd9_1
  - qcelemental=0.24.0=pyhd8ed1ab_0
  - nglview

peastman · 2023-07-22T00:20:04Z

Can you show how you're accessing it? I retrieve the records from the dataset with ds.get_records(). Then I look up the data from them with [recs.iloc[i].record.dict()['extras'] for i in range(len(recs))]. For about half the records in the ion pairs dataset, it's empty.

pavankum · 2023-07-22T00:24:58Z

I think I was doing almost the same

import qcportal as qcp

client = qcp.FractalClient()
ds = client.get_collection('Dataset', 'SPICE Ion Pairs Single Points Dataset v1.1')
for row in ds.list_records().iloc:
        spec = row.to_dict()
        if spec['method'] == 'wb97m-d3bj':
            recs = ds.get_records(method=spec['method'], basis=spec['basis'], program=spec['program'], keywords=spec['keywords'])
            break
for r in recs.iterrows():
    print(r[1].record.extras)
    break

peastman · 2023-07-23T18:35:43Z

Here's what I do:

from qcportal import FractalClient
fc = FractalClient()
ds = fc.get_collection('Dataset', 'SPICE Ion Pairs Single Points Dataset v1.1')
spec = ds.list_records().iloc[0].to_dict()
recs = ds.get_records(method=spec['method'], basis=spec['basis'], program=spec['program'], keywords=spec['keywords'])
print([recs.iloc[i].record.extras.keys() for i in range(len(recs))])

For about half the records there are two keys: dict_keys(['_qcfractal_tags', 'qcvars']). And for the other half it's empty: dict_keys([]).

pavankum · 2023-07-24T15:40:51Z

your call is accessing a different spec

spec = ds.list_records().iloc[0].to_dict()

output:
{'driver': 'gradient',
 'program': 'psi4',
 'method': 'b3lyp',
 'basis': 'dzvp',
 'keywords': 'openff-default',
 'name': 'B3LYP/dzvp-openff-default'}

peastman · 2023-08-07T19:54:23Z

The updated file is now available on Zenodo. Thanks for reporting this!

peastman mentioned this issue Aug 3, 2023

Account for inconsistency in naming of data fields #77

Merged

peastman closed this as completed in #77 Aug 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mayer and Wiberg-Lowdin bond-indices missing from some subsets #76

Mayer and Wiberg-Lowdin bond-indices missing from some subsets #76

IgnacioJPickering commented Jul 20, 2023 •

edited

Loading

IgnacioJPickering commented Jul 20, 2023 •

edited

Loading

peastman commented Jul 21, 2023

pavankum commented Jul 21, 2023

IgnacioJPickering commented Jul 21, 2023

pavankum commented Jul 21, 2023

peastman commented Jul 22, 2023

pavankum commented Jul 22, 2023 •

edited

Loading

peastman commented Jul 23, 2023

pavankum commented Jul 24, 2023

peastman commented Aug 7, 2023

Mayer and Wiberg-Lowdin bond-indices missing from some subsets #76

Mayer and Wiberg-Lowdin bond-indices missing from some subsets #76

Comments

IgnacioJPickering commented Jul 20, 2023 • edited Loading

IgnacioJPickering commented Jul 20, 2023 • edited Loading

peastman commented Jul 21, 2023

pavankum commented Jul 21, 2023

IgnacioJPickering commented Jul 21, 2023

pavankum commented Jul 21, 2023

peastman commented Jul 22, 2023

pavankum commented Jul 22, 2023 • edited Loading

peastman commented Jul 23, 2023

pavankum commented Jul 24, 2023

peastman commented Aug 7, 2023

IgnacioJPickering commented Jul 20, 2023 •

edited

Loading

IgnacioJPickering commented Jul 20, 2023 •

edited

Loading

pavankum commented Jul 22, 2023 •

edited

Loading