Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add developer docs about zkg's internal use of directories #175

Merged
merged 1 commit into from
Jan 24, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions doc/developers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,3 +88,116 @@ INI File Option ```name``` `name`
========================== =============================== ===========================

Python API docstrings roughly follow the `Google Style Docstrings`_ format.

Internals
---------

``zkg``'s view of a package
~~~~~~~~~~~~~~~~~~~~~~~~~~~

``zkg`` maintains copies of a Zeek package in up to four places:

- A long-lived clone of the package's git repo in
``$STATEDIR/clones/package/<name>``. This clone (not its installed version,
see below) is ``zkg``'s "authoritative" view of a package.

- A conceptually short-lived clone in ``$STATEDIR/scratch/<name>``, for
retrieving information about a package ("short-lived", because access to those
copies is momentary -- not because ``zkg`` necessarily cleans up after those
copies).

- A "stage". A stage is any place in which ``zkg`` actually installs a package
for running in Zeek. This can be the local Zeek installation, or locally below
``$STATEDIR/testing`` when testing a package. So "staging" here does not mean
"not yet live"; rather the opposite: it's closer to "any place where the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. I think that might help with the mind-twistedness I experience whenever I try to understand anything of the staging / installation flow.

Having it in the docs is good, but I wonder if we should do something in the code. "stage" and "staging" in the deployment sense have such a well established meaning that this deviation seems very unfortunate and confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just my read of it btw, but I don't see any code where a stage would be transitioned to a deployment, so I think that interpretation is closest. And yeah, I'd be happy to change the code. Simply calling it an Installation might do it.

package may actually run". A stage allows customization of many of the
directories involved (for internal cloning, installing Zeek scripts and
plugins, binary files, etc). Installation into a stage is also where ``zkg``
adds its management of ``packages.zeek``, in the stage's scripts directory.

- In ``$STATEDIR/testing/<name>/clones/<names>``, for testing a given package
along with any dependencies it might have.

``zkg`` captures state about installed packages in ``$STATEDIR/manifest.json``.
This does not capture all knowable information about packages, though.
More on this next.

Directory usage
~~~~~~~~~~~~~~~

``zkg`` populates its internal state directory (dubbed ``$STATEDIR`` below) with
several subdirectories.

``$STATEDIR/clones``
""""""""""""""""""""

This directory keeps git clones of installed packages
(``$STATEDIR/clones/package/<name>``), packages sources
(``$STATEDIR/clones/source/<name>``), and package template repositories
(``$STATEDIR/clones/template/<name>``).

``zkg`` clones the relevant repositories as needed. It can dynamically re-create
some of these directories as needed, but interfering in that space is not
recommended. For example, if you remove a clone of an installed package, the
installation itself will remain live (via the staging mechanism), but ``zkg``
will no longer be able to refer to all information about the installed package
(because anything not explicitly captured about the package in
``$STATEDIR/manifest.json`` is now gone).

Removal of a package (``zkg remove``) removes its clone in ``$STATEDIR/clones``.

``$STATEDIR/scratch``
"""""""""""""""""""""

When retrieving information on a package that isn't yet installed, or where
``zkg`` doesn't want to touch the installed code, ``$STATEDIR/scratch/<name>``
is a clone of the package's git repo at a version (often a git tag) of
interest. This clone is shallow for any versions that aren't raw SHA-1
hashes. The information parsed includes the ``zkg.meta`` as well as git
branch/tag/commit info.

During package installation, ``zkg`` places backups of user-tweakable files into
``$STATEDIR/scratch/tmpcfg``. ``zkg`` restores these after package installation
to preserve the user's edits. During package source aggregation, ``zkg`` places
temporary versions of ``aggregate.meta`` directly into ``$STATEDIR/scratch``.

Creation or unbundling of a package happens via ``$STATEDIR/scratch/bundle``, to
compose or retrieve information about the bundle. The directory is deleted and
re-created at the beginning of those operations:

- During bundling, ``zkg`` copies installed package repos from
``$STATEDIR/clones/<name>`` into ``$STATEDIR/scratch/bundle/<name>``, and
creates fresh git clones in the ``bundle`` directory for any packages not
currently installed. It creates a ``.tar.gz`` of the whole directory,
initially in the bundle directory, and moves it to where the user specified.

- During unbundling, ``zkg`` reads the bundle manifest as well as the git repos
of the contained packages, and moves the package repos from the scratch space
into ``$STATEDIR/clones/package/<name>``.

When installing a package's ``script_dir`` or ``plugin_dir`` into a staging area
and the source file is a tarfile, ``zkg`` temporarily extracts the tarball into
``$STATEDIR/scratch/untar/``.

There's little or no cleanup of files in the scratch space after the operations
creating them complete. ``zkg`` only deletes, then (re-)creates, the directories
involved upon next use.

When ``zkg`` isn't in the middle of executing any commands, you can always
delete the scratch space without negatively affecting ``zkg``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's on your mind, inverting this behavior and introducing --keep-scratch / --keep-testing or some --keep-temporary wildcard would be less surprising.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, especially for scratch. zkg could no doubt clean up properly in there. And for testing it could at least clean up when the tests pass. (And it should be much better about pointing you at the trouble if they don't, but that's another one.)


``$STATEDIR/testing``
"""""""""""""""""""""

When testing a package (during installation, or when explicitly running ``zkg
test``), ``zkg`` creates a staging area ``$STATEDIR/testing/<name>`` for the
package under test, clones the package and its dependencies into
``$STATEDIR/testing/<name>/clones/``, installs them from there into
``$STATEDIR/testing/<name>``, and then runs the package's ``test_command`` from
its clone (``$STATEDIR/testing/<name>/clones/<name>``), with an environment set
such that it finds the installation in the local stage. The stdout and stderr of
those testing runs is preserved into ``zkg.test_command.stdout`` and
``zkg.test_command.stderr`` in that directory.

As in ``STATEDIR/scratch``, there's no cleanup, and you can delete the testing
space as needed after a test run is complete.