Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for DMRG test suite and related bugs. #96

Open
wants to merge 34 commits into
base: main
Choose a base branch
from

Conversation

ShaunWeatherly
Copy link
Contributor

Major Changes:

  • Modifies the tests in dmrg_molbe_test to run in github actions.
  • Exposes useful scratch management options to high level modules, iembe.BE. Namely: one can now specify scratch_dir and cleanup_at_end when instantiating the BE object.
  • Adds missing DMRG modules to be_parallel.

Specifics:

  • Fixes to mbe.BE.scratch_dir. This keyword was broken by the recent changes to scratch management - now it is correctly passed to WorkDir and used to generate the scratch root.
  • Improved caching for different parts of the code in quemb_unittest.yml. Currently separated into pip-cache, quemb-cache and block2-cache. Also makes cache keys more descriptive, which fixes a bug where caches failed to update despite dependencies changing.
  • DMRG scratch cleanup now relies on WorkDir.cleanup_at_end, and the old kwarg (force_cleanup) has been removed.
  • Fixes reference to rets[0] as the total BE energy in solver, which is incorrect.

Full commit history ~

  • Fixes for scratch management in solver and helper modules.

  • Added DMRG_ArgsUser to TestBE_DMRG

  • Fixed incorrect definition of ebe_tot as rets[0], which is only the correlation energy.

  • Make helper.delete_multiple_files recursive.

  • Make use_cumulant=True by default in solver_block2.

  • Modify molecular_DMRG_test

  • Remove block2 versioning

  • Test explicit cache save/restore.

  • Fully explicit save/restore for cache

  • Block2 no-binary

  • Test no-binary

  • Typo

  • Explicit block2 versioning

  • Move to test_requirements

  • Edit block2 versioning.

  • Force latest mkl

  • test latest mkl

  • Separate caching for block2

  • Test caching

  • Test caching

  • Fine tune caching

  • Further fine-tune caching

  • Back to block2 versioning

  • Formatting and ompnum passing.

  • Install quemb in workflows.

  • resolve molecular_DMRG_test scratch

  • Update example script

  • Update .gitignore

  • Enable all unittests


ShaunWeatherly and others added 17 commits January 22, 2025 12:44
* Added default dmrg `settings.py` to `external`, modified workflow,
+ initial testing.

* `ruff` fixes in `dmrgscf_settings`

* `ruff` fixes again.

* Merging and bringing up to date.

* Merge from Main (#87)

* Automatic generation of API documentation (#70)

- Create API reference automatically for all public members of quemb
Before this change one had to manually create rst files and manage them. Now they are recursively created automatically.
The layout of the documentation matches the layout of the namespace

- This change automatically ensures that all (public) docstrings are actually parsed by Sphinx and it is ensured that they are properly written. Had to fix some docstrings

- Type-hinting is now picked up by sphinx.

* H5py use contextmanager (#73)

* put if __main__ guard to test

* use h5py contextmanager nearly everywhere

* fixed a bug from wrong close statement

* Fix parallelization and add parallel tests (#75)

* fix _opt to work with be_parallel

* modify oneshot to work with be_func_parallel

* modify two tests to use nproc > 1

* fix ruff errors

* modify octane test

* Remove CCSD reference from octane test

* Final scratch dir attempt (#74)

# Mayor changes

This PR unifies the use of the ScratchDir. It is now guaranteed, that files are cleaned up if the calculation exits with no error. (If the user passes `WorkDir(cleanup_at_end=False)` the files are retained).
A `WorkDir` is passed from the top of the call stack to the bottom. This allows the user to change the WorkDir if necessary. (Fixes #19)

Got rid of some special cased keyword arguments for DMRG and moved them into an explicit `DMRG_solver_kwargs`.
This way we can catch arguments that are supplied and do nothing.
It is probably a good idea to do the same for SHCI specific keywords @lweisburn .
It is probably a good idea to get rid of the dictionary altogether and to replace it with a dataclass.

# Minor changes

If the `__init__` of a class was just populating the attributes, then I replaced the class definition with `attrs.define`.

On the way I did a lot of type hinting. It is probably a good idea to double check my assumptions in review.

Additionally, I did some refactoring, where explicit open-close pairs where replaced with context managers.
The same was true for multiprocessing Pools.

Here and there I encountered constructs that could be simplified a lot
```python3
veffs = []
[veffs.append(result.get()) for result in results]
# becomes
veffs = [result.get() for result in results]
```
or
```python3
self.frozen_core = False if not fobj.frozen_core else True
# becomes
self.frozen_core = fobj.frozen_core
```
or
```python3
rdm_return = False
if relax_density:
    rdm_return = True
# becomes
rdm_return = relax_density
```

* Update fragment energies (#79)

* Untangling eeval and frag_energy energy evaluation options

* small changes for frag_energy consistency

* start changing eeval and ereturn in parallel functions, making serial and parallel consistent

* put if __main__ guard to test

* use h5py contextmanager nearly everywhere

* changed mbe.StoreBE into attrs defined class

* ignore broken link to numpy.float64 in documentation

* fixed a bug from wrong close statement

* testsuite don't wait for analysis

* added types for molbe BE object

* fixed wrong close statements

* fixed scratch-dir for molbe-be

* added typing for molbe.BE.optimize

- this fixed a likely bug in optimize.
previously we had
    J0 = [[J0[-1, -1]]]
now it is
    J0 = [[J0[-1][-1]]]
since it is a list[list[float]]

* renamed _opt.py to opt.py

finished BEOPT attrs refactoring

* pass on solver_kwargs to be_func

* added types to be_func

* added delete multiple_files function

* use the new scratch dir in be_func

* added types to molbe.BE.optimize + call self.scratch_dir.cleanup

call self.scratch_dir.cleanup in both `optimize` and `oneshot`

* moved schmidt_decomposition to prevent circular import

makes much more sense like this

* added type hints to be_func_parallel

* fixed typo

* fixed several small errors in the code

* fixed be_func_parallel calls

* simplified be_func_parallel

* added types to run_solver

* use frag_scratch in be_func_parallel

* use frag_scratch in run_solver

* added typehints to kbe BE class

* simplified a few boolean expressions

* the tests should be working now

* ensure scratch cleanup for kbe

* removed call to self.compute_energy_full (related to #35)

* write nicer Pool statements

- used proper list comprehensions
- use contextmanager for Pool

* fixed naming error

* use more explicit way of freeing memory

(it is still ugly... )

* refactored expression

* use Tuple[int, ...] for numpy shapes :-(

* refactored WorkDir to use atexit.register for cleanup

* added better docstring for config

* require static analysis to be ok for running test suite

* renamed DMRG specific solver kwargs to its proper name

* better naming for DMRG stuff

* added types to block2 DMRG function

* refactor some DMRG arguments

* change behaviour of scratch contextmanager

now it is ensured, that files are deleted even after an exception when
using it as context manager and cleanup_at_end=True

* fixed the deadlock in the test requirements

* added new scratch dir also to ube

* avoid list[list[float]]; use consistently array instead

* Update energy keywords and logic in restricted BE, serial and parallel, oneshot and parallel.

Update and rearrange some documentation for the keywords

TODO: add non-cumulant energy, update unrestricted BE

* Update kbe/pbe and kbe/misc for consistency

* Fix kbe/pbe and kbe/misc to raise error for non-cumulant

* Add non-cumulant energy option for molecular code

* remove mypy attempt from get_energy_frag for now

* fix be2puffin call of oneshot and update ube oneshot default

* Update get_frag_energy function to work for periodic calculations

* Remove double del line

* remove redundant frag_energy keyword

* Update src/quemb/molbe/helper.py, eri_file to be optional

Co-authored-by: Minsik <[email protected]>

* Update src/quemb/molbe/mbe.py: ebe_tot readability

Co-authored-by: Minsik <[email protected]>

* move use_cumulant to optimize and oneshot, not BE

---------

Co-authored-by: Leah Weisburn <[email protected]>
Co-authored-by: Oskar Weser <[email protected]>
Co-authored-by: Minsik <[email protected]>

* Adding back veff0 for molecular code, removing the now-unnecessary hf_veff from be_func (#81)

* More type annotations + simpler boolean expressions (#76)

Simplifications in the code

- A couple more type annotations

- simpler boolean expressions

- removed keyword arguments if they are always true in the code

* Dataclass for solver kwargs (#77)

- Introduced a dataclass for solver kwargs. This is makes it much more explicit what arguments are supported by DMRG and SHCI

- Could simplify the interface of BE by removing a couple of arguments

* Simplify init (#82)

- moved the saving operation out of the __init__ method.

- don't redundantly store information from `fobj`, but use `fobj` itself directly.

* Make numpy shorter (#83)

- replacing `np.dot` with `@`
- replacing the `numpy.something` calls with `from numpy import something` if functions appear several times
- replacing the `numpy.something` call with `np.something` if a function appears rarely

---------

Co-authored-by: Oskar Weser <[email protected]>
Co-authored-by: Leah Weisburn <[email protected]>
Co-authored-by: Leah Weisburn <[email protected]>
Co-authored-by: Minsik <[email protected]>

* Fixes for scratch management in `solver` and `helper` modules.

* Added `DMRG_ArgsUser` to `TestBE_DMRG`

* Fixed incorrect definition of `ebe_tot` as `rets[0]`, which is only the
correlation energy.

* Make `helper.delete_multiple_files` recursive.

* Make `use_cumulant=True` by default in `solver_block2`.

* Modify `molecular_DMRG_test`

* Redundant fixes for `lo` module.

* Sort imports in `solver`

* Remove extra blank lines.

* Explicit versioning for block2 in `yaml`

* Add `block2` to `test_requirements`

* Reset `frag_type` in `molbe_dmrg...` example script.

* Testing `max_mem`

* Test `LD_PRELOAD`

* try only one of the block2

* cleaned up test environment

* don't wait for analysis

* Test `conda`

* Test `nomkl` via `pip`

* Test `nomkl` via `pip`

* tests `nomkl`

* Tests `nomkl`

* Tests `nomkl`

* Tests `nomkl`

* Test block2 versioning.

* Test block2 versioning.

* Update cache keys.

* Update cache keys.

* Update cache keys

* Update cache keys

* Remove block2 versioning

* Test explicit cache save/restore.

* Fully explicit save/restore for cache

* Block2 `no-binary`

* Test `no-binary`

* Typo

* Explicit block2 versioning

* Move to `test_requirements`

* Edit block2 versioning.

* Force latest `mkl`

* test latest `mkl`

* Separate caching for `block2`

* Test caching

* Test caching

* Fine tune caching

* Further fine-tune caching

* Back to block2 versioning

* Formatting and `ompnum` passing.

* Install quemb in workflows.

* resolve `molecular_DMRG_test` scratch

* Update example script

* Update .gitignore

* update .gitignore

* Enable all unittests

---------

Co-authored-by: Oskar Weser <[email protected]>
Co-authored-by: Leah Weisburn <[email protected]>
Co-authored-by: Leah Weisburn <[email protected]>
Co-authored-by: Minsik <[email protected]>
@ShaunWeatherly ShaunWeatherly marked this pull request as ready for review January 22, 2025 22:32
@ShaunWeatherly
Copy link
Contributor Author

Note to @lweisburn and @mscho527 , this PR corrects the definition of BE.ebe_tot to be the total BE energy. Previously, some parts of the code were using it as the correlation energy, including in the misc.be2puffin function. I've updated that function to explicitly use the correlation energy (which I assume was the original intent) and I just wanted to double check and make sure I didn't break anything!

@mscho527
Copy link
Member

I haven't read the details, but this seems like a substantial change in the code with a single PR. Shall we ask 2+ approvals as we have done for previous large PRs? Also, it will take a bit for me to parse through this.

Copy link
Contributor

@mcocdawc mcocdawc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this test work!

See my comments below

I would await a comment of @mscho527 or @lweisburn regarding the adjusted energy

.github/workflows/quemb_unittest.yml Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
@@ -193,6 +201,28 @@ def run_solver(
mch.mc1step()
rdm1_tmp, rdm2s = mch.fcisolver.make_rdm12(0, nmo, nelec)

elif solver in ["block2", "DMRG", "DMRGCI", "DMRGSCF"]:
frag_scratch = WorkDir(
path=Path(scratch_dir / dname),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cast to Path is fully redundant and I would omit it.

WorkDir is quite flexible in its input arguments and accepts PathLike, i.e. even strings and
scratch_dir / dname returns a Path anyway.

The passing of cleanup_at_end is redundant, since if scratch_dir.cleanup_at_end is True it will remove subdirectories as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup_at_end is not redundant, just consider the converse of your example: if scratch_dir.cleanup_at_end is False, one would also like the fragment scratch directories to have scratch_dir.cleanup_at_end to be False, whereas by default they would be set to True and the fragment scratch directories would be removed despite the user setting cleanup_at_end to False.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you @ShaunWeatherly creating WorkDir objects for each fragment? Wouldn't we want one WorkDir object (i.e. molbe.scratch_dir and create subdirectories (dname) under here? Unless we want granular control over the cleanups (i.e. we want to clean up fragment 0 but leave fragment 1), I don't think multiple instances of WorkDir would be necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having a normal subdirectory would also not force you to keep track of scratch_dir.cleanup_at_end.
If scratch_dir.cleanup_at_end is True the subdirectory will be deleted anyway and if scratch_dir.cleanup_at_end is False, then it will be preserved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, one could just use PathLikes for the fragment scratch directories, but I think using WorkDir instead keeps the code more internally consistent. The most elegent workaround would be to add a create_subdirectory class method to WorkDir, which inherits the properties of the base class.

src/quemb/molbe/be_parallel.py Outdated Show resolved Hide resolved
src/quemb/molbe/mbe.py Outdated Show resolved Hide resolved
src/quemb/molbe/solver.py Show resolved Hide resolved
src/quemb/molbe/solver.py Outdated Show resolved Hide resolved
src/quemb/shared/helper.py Show resolved Hide resolved
tests/data/molecular_DMRG_test/f1/rdm1.npy Outdated Show resolved Hide resolved
src/quemb/molbe/mbe.py Outdated Show resolved Hide resolved
Copy link
Member

@mscho527 mscho527 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments regarding some questions and confusions, but looks solid so far!

key: ${{ env.pythonLocation }}-${{ runner.OS }}-pip-cache-${{ hashFiles('setup.py') }}}}

- name: Install pip
if: steps.restore-pip-cache.outputs.cache-hit != 'true'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't pip upgrade happen regardless of the cache hit?

Comment on lines +113 to +132
- name: Restore quemb Cache
id: restore-quemb-cache
uses: actions/cache/restore@v4
with:
path: ${{ env.pythonLocation }}
key: ${{ env.pythonLocation }}-${{ runner.OS }}-quemb-cache-${{ hashFiles('setup.py') }}-${{ hashFiles('dev-requirements.txt') }}-${{ hashFiles('tests/test_requirements.txt') }}}}

- name: Install dependencies
- name: Install quemb
if: steps.restore-quemb-cache.outputs.cache-hit != 'true'
run: |
pip install -r tests/test_requirements.txt
pip install git+https://github.com/pyscf/dmrgscf
PYSCFHOME=$(pip show pyscf-dmrgscf | grep 'Location' | tr ' ' '\n' | tail -n 1)
wget https://raw.githubusercontent.com/pyscf/dmrgscf/master/pyscf/dmrgscf/settings.py.example
mv settings.py.example ${PYSCFHOME}/pyscf/dmrgscf/settings.py
pip install .

- name: Save quemb Cache
if: steps.restore-quemb-cache.outputs.cache-hit != 'true'
id: save-quemb-cache
uses: actions/cache/save@v4
with:
path: ${{ env.pythonLocation }}
key: ${{ steps.restore-quemb-cache.outputs.cache-primary-key }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm wrong, but isn't this potentially dangerous? I think we want to test installing quemb everytime, even if it adds a few seconds to the test suite. Since we are testing quemb, I don't feel comfortable caching it based on hashes that only look at specific files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cache is really just for the dependencies, we still run pip install . at the beginning of the testsuite to install quemb itself. I can change the name if it's misleading.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think the name is misleading. If the purpose is to just install the dependencies, should line 124 (pip install .) be removed as well?

uses: actions/cache/restore@v4
with:
path: ${{ env.pythonLocation }}
key: ${{ env.pythonLocation }}-${{ runner.OS }}-block2-cache-${{ hashFiles('setup.py') }}-${{ hashFiles('dev-requirements.txt') }}-${{ hashFiles('test_requirements.txt')}}}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this key depend on the contents of block2 repository? In other words, if something changes on pyscf/dmrgscf or pypi release of block2, does this key logic distinguish the change?

Copy link
Contributor Author

@ShaunWeatherly ShaunWeatherly Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those changes would need to be reflected in the list of test_requirements.txt dependencies, but otherwise no explicit information from block2 is used for key generations. This is also true for all of the other dependencies, however.

Ie., if the versioning for libdmet changes, for example, the cache key technically will not change unless we update setup.py with the new versioning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this raises a good point then. @mcocdawc If we want to see intended results with the cache, I think we need to actually give specific version number in test_requirements.txt even if the packages are mentioned in setup.py. As it is currently set up now, even if new versions are uploaded on pypi (or github for libdmet), cache key would stay same (and thus we wouldn't be able to pull newer version on pypi without changing setup.py). I propose that we fix this in a separate PR.

Similarly, @ShaunWeatherly I think dmrgscf should also reference a specific tag or commit. block2 should also have a version specified (both in the workflow and the key).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I understand what the point of this caching is?
Getting cache invalidation correct is really hard and the proposed setup will force us to regularly manually update the requirements files.
If performance of the test suite is the only reason I fear it might be a bit unnecessarily complicated.
I get why it makes sense for projects with larger dependencies or where they have to compile dependencies, but our build process from scratch is so fast, that I just would do it from scratch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, another question for @ShaunWeatherly -- if the test suite on Github Actions isn't invoking block2, do we need to install block2?


- name: Test with pytest
run: |
pip install .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this redundant? Line 124 installs from the repo as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are any changes to quemb itself, this is the line that updates those parts of the code. The previous pip install . above is really only used for installing/caching the dependencies in setup.py, hence why it's hash is used in the key.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we would want to install and reinstall if there are any changes. The hash key in the quemb cache workflow above cannot detect changes made in the quemb code. I think we should install quemb from the latest commit once other dependencies are resolved (through cache, pip, or what not)

example/molbe_h8_chemical_potential.py Show resolved Hide resolved
src/quemb/shared/helper.py Show resolved Hide resolved
@@ -84,12 +89,15 @@ class WorkDir:

path: Final[Annotated[Path, "An absolute path"]] = field(converter=_get_abs_path)
cleanup_at_end: Final[bool] = True
allow_existing: Final[bool] = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason why the default value of allow_existing has to be False? Does this conflict with mixed-basis code and require modification there as is? @lweisburn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simply how Oskar has setup WorkDir, where the scratch directory provided by the user was required to be empty (otherwise it would raise a ValueError). This gives you the option to turn off that requirement by setting allow_existing=True

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcocdawc Was there a reason why you required scratch directory to be empty?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I required it to prevent silent overwriting of files, as @ShaunWeatherly pointed out in his note.
But even more dangerous is that the WorkDir is deleted in the end if cleanup_at_end is true which it is by default.
This means that WorkDir("/home/${USER}") would delete the home directory.

On the other hand @lweisburn and you (?) @mscho527 need an already populated WorkDir for mixed basis calculations..

So yes, there should be an allow_existing argument which should be False by default.
It should also be added to the from_environment constructor.
I would then also expand the error message along the line of.

ValueError("scratch_area is not empty and allow_existing is False. "
              "You can set `allow_existing=True` in the constructor, but note that "
               "the working directory is deleted if `cleanup_at_end` is True, which "
               "it is by default."
    )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha @mcocdawc I see! @lweisburn we should assess how to let this work with mixed basis code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the mixed basis code, we will have to use this option discussed above with allow_existing to be True, but it makes sense that for other uses for it to be False by default

tests/data/molecular_DMRG_test/f0/dmrg.conf Show resolved Hide resolved
tests/dmrg_molBE_test.py Show resolved Hide resolved
Comment on lines +44 to +47
@unittest.skipIf(
os.getenv("GITHUB_ACTIONS") == "true",
"This test cannot be run in github actions.",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an easy way to detect whether block2 is installed or not? If so, I think replacing GITHUB_ACTIONS check with the check would be nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The installation of block2 is not the problem, it's that the test simply cannot be run in github actions. Oskar and I spent a lot of time trying to find a better work around, but to no avail. This seems to be the easiest solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Oskar and you discussed this already, but what do you mean by 'cannot'? I trust you two, but just curious.

Anyway, if I understood correctly, this test essentially tests i) all the way upto calling block2 and ii) after block2 finishes and quemb should do some post-processing, right? I worry that as is, the tests can make us overconfident. If at any point no one performs BE-DMRG calculations anymore, CI test on itself would no longer automatically test the full pipeline. I think this aspect of DMRG test routine should be noted in the docs.

@@ -79,10 +79,9 @@ jobs:

testsuite:
runs-on: ubuntu-latest
needs: analysis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to restore analysis requirement before merge

If :python: `allow_existing` is False (default), then the directory
specified by :python: `path` must be empty.
NOTE: Use caution when setting this to True, as any files in the non-empty
directory may be overwritten!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even more dangerous is the fact that the directory gets cleaned up, if cleanup_at_end=True.
i.e.

WorkDir("/home/${USER}")

will clean up the home directory of the user in the end.


I would use the sphinx note syntax https://sublime-and-sphinx-guide.readthedocs.io/en/latest/notes_warnings.html

... note: Use caution ...

@@ -84,12 +89,15 @@ class WorkDir:

path: Final[Annotated[Path, "An absolute path"]] = field(converter=_get_abs_path)
cleanup_at_end: Final[bool] = True
allow_existing: Final[bool] = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I required it to prevent silent overwriting of files, as @ShaunWeatherly pointed out in his note.
But even more dangerous is that the WorkDir is deleted in the end if cleanup_at_end is true which it is by default.
This means that WorkDir("/home/${USER}") would delete the home directory.

On the other hand @lweisburn and you (?) @mscho527 need an already populated WorkDir for mixed basis calculations..

So yes, there should be an allow_existing argument which should be False by default.
It should also be added to the from_environment constructor.
I would then also expand the error message along the line of.

ValueError("scratch_area is not empty and allow_existing is False. "
              "You can set `allow_existing=True` in the constructor, but note that "
               "the working directory is deleted if `cleanup_at_end` is True, which "
               "it is by default."
    )

@@ -893,7 +897,11 @@ def solve_block2(
# pylint: disable-next=E0611
from pyscf import dmrgscf # noqa: PLC0415 # optional module

if ompnum is not None:
dmrgscf.settings.MPIPREFIX = "mpirun -np " + str(ompnum) + " --bind-to none"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

@lweisburn
Copy link
Contributor

Note to @lweisburn and @mscho527 , this PR corrects the definition of BE.ebe_tot to be the total BE energy. Previously, some parts of the code were using it as the correlation energy, including in the misc.be2puffin function. I've updated that function to explicitly use the correlation energy (which I assume was the original intent) and I just wanted to double check and make sure I didn't break anything!

Great, thanks for noting this! We'll make sure that this is consistent with other external (mixed-basis) code too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants