Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/maint'
Browse files Browse the repository at this point in the history
* origin/maint:
  Add `datalad/typing.py`
  Make `ensure_bool()` take `Any`
  [release-action] Autogenerate changelog snippet for PR 7318
  Type-annotate and fix `datalad/support/strings.py`
  Fix failing test
  Fixes
  [release-action] Autogenerate changelog snippet for PR 7317
  Type-annotate almost all of `datalad/utils.py`
  codespell -- account for new pickups: ignore another var, fix some typos
  Update isort in `.pre-commit-config.yaml`
  Add scriv to devel-tools
  A few more possible ad-hoc folders to skip from codespell [ci skip]
  Change docstring "subsections" to bold
  Add changelog snippet
  Change formatting in RIA docstring
  Restructure docstring headings
  List commands missing from api docs
  BF: path from .gitmodules could not be used with source candidate template
  • Loading branch information
yarikoptic committed Mar 13, 2023
2 parents 4c9d834 + 3320e00 commit 03f1433
Show file tree
Hide file tree
Showing 19 changed files with 612 additions and 404 deletions.
5 changes: 3 additions & 2 deletions .codespellrc
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
[codespell]
skip = .venv,venvs,.git,build,*.egg-info,*.lock,.asv,.mypy_cache,.tox,fixtures,_version.py,*.pem
skip = .venv,venvs,.git,build,*.egg-info,*.lock,.asv,.mypy_cache,.tox,fixtures,_version.py,*.pem,trash,dist
# commitish - vote if we want to fix (should be committish) -- used in GitRepo API
# froms - plural "from" introduced by export_archive_ora
# ned - Ned is a name
ignore-words-list = ba,commitish,froms,ro,ned
# includeds - func arg name
ignore-words-list = ba,commitish,froms,ro,ned,includeds
exclude-file = .codespell-ignorelines
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
repos:
- repo: https://github.com/PyCQA/isort
rev: 5.9.3
rev: 5.12.0
hooks:
- id: isort
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -400,7 +400,7 @@ commits and then provide stats:
- `asv continuous maint master` - would run and compare `maint` and `master` branches
- `asv continuous HEAD` - would compare `HEAD` against `HEAD^`
- `asv continuous master HEAD` - would compare `HEAD` against state of master
- [TODO: contineous -E existing](https://github.com/airspeed-velocity/asv/issues/338#issuecomment-380520022)
- [TODO: continuous -E existing](https://github.com/airspeed-velocity/asv/issues/338#issuecomment-380520022)

Notes:
- only significant changes will be reported
Expand Down
7 changes: 7 additions & 0 deletions changelog.d/pr-7280.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
### 🐛 Bug Fixes

- Fixed that the `get` command would fail, when subdataset source-candidate-templates where using the `path` property from `.gitmodules`.
Also enhance the respective documentation for the `get` command.
Fixes [#7274](https://github.com/datalad/datalad/issues/7274) via
[PR #7280](https://github.com/datalad/datalad/pull/7280)
(by [@bpoldrack](https://github.com/bpoldrack))
6 changes: 6 additions & 0 deletions changelog.d/pr-7289.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
### 📝 Documentation

- Include a few previously missing commands in html API docs.
Fixes [#7288](https://github.com/datalad/datalad/issues/7288) via
[PR #7289](https://github.com/datalad/datalad/pull/7289)
(by [@mslw](https://github.com/mslw))
3 changes: 3 additions & 0 deletions changelog.d/pr-7317.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
### 🏠 Internal

- Type-annotate almost all of `datalad/utils.py`; add `datalad/typing.py`. [PR #7317](https://github.com/datalad/datalad/pull/7317) (by [@jwodder](https://github.com/jwodder))
3 changes: 3 additions & 0 deletions changelog.d/pr-7318.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
### 🏠 Internal

- Type-annotate and fix `datalad/support/strings.py`. [PR #7318](https://github.com/datalad/datalad/pull/7318) (by [@jwodder](https://github.com/jwodder))
59 changes: 32 additions & 27 deletions datalad/distributed/create_sibling_ria.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,32 +78,36 @@ class CreateSiblingRia(Interface):
The store's base path is expected to not exist, be an empty directory,
or a valid RIA store.
RIA URL format
~~~~~~~~~~~~~~
Notes
-----
**RIA URL format**
Interactions with new or existing RIA stores require RIA URLs to identify
the store or specific datasets inside of it.
The general structure of a RIA URL pointing to a store takes the form
'ria+[scheme]://<storelocation>' (e.g.,
ria+ssh://[user@]hostname:/absolute/path/to/ria-store, or
ria+file:///absolute/path/to/ria-store)
``ria+[scheme]://<storelocation>`` (e.g.,
``ria+ssh://[user@]hostname:/absolute/path/to/ria-store``, or
``ria+file:///absolute/path/to/ria-store``)
The general structure of a RIA URL pointing to a dataset in a store (for
example for cloning) takes a similar form, but appends either the datasets
UUID or a ~ symbol followed by the dataset's alias name:
'ria+[scheme]://<storelocation>#<dataset-UUID>' or
'ria+[scheme]://<storelocation>#~<aliasname>'.
UUID or a "~" symbol followed by the dataset's alias name:
``ria+[scheme]://<storelocation>#<dataset-UUID>`` or
``ria+[scheme]://<storelocation>#~<aliasname>``.
In addition, specific version identifiers can be appended to the URL with an
additional @ symbol:
'ria+[scheme]://<storelocation>#<dataset-UUID>@<dataset-version>', where 'dataset-version' refers to a branch or tag.
additional "@" symbol:
``ria+[scheme]://<storelocation>#<dataset-UUID>@<dataset-version>``,
where ``dataset-version`` refers to a branch or tag.
RIA store layout
~~~~~~~~~~~~~~~~
**RIA store layout**
A RIA store is a directory tree with a dedicated subdirectory for each
dataset in the store. The subdirectory name is constructed from the
DataLad dataset ID, e.g. '124/68afe-59ec-11ea-93d7-f0d5bf7b5561', where
DataLad dataset ID, e.g. ``124/68afe-59ec-11ea-93d7-f0d5bf7b5561``, where
the first three characters of the ID are used for an intermediate
subdirectory in order to mitigate files system limitations for stores
containing a large number of datasets.
Expand All @@ -115,34 +119,35 @@ class CreateSiblingRia(Interface):
It is possible to selectively disable either component using
``storage-sibling 'off'`` or ``storage-sibling 'only'``, respectively.
If neither component is disabled, a dataset's subdirectory layout in a RIA
store contains a standard bare Git repository and an 'annex/' subdirectory
store contains a standard bare Git repository and an ``annex/`` subdirectory
inside of it.
The latter holds a Git-annex object store and comprises the storage sibling.
Disabling the standard git-remote ('storage-sibling=only') will result
Disabling the standard git-remote (``storage-sibling='only'``) will result
in not having the bare git repository, disabling the storage sibling
('storage-sibling=off') will result in not having the 'annex/' subdirectory.
(``storage-sibling='off'``) will result in not having the ``annex/``
subdirectory.
Optionally, there can be a further subdirectory 'archives' with
Optionally, there can be a further subdirectory ``archives`` with
(compressed) 7z archives of annex objects. The storage remote is able to
pull annex objects from these archives, if it cannot find in the regular
annex object store. This feature can be useful for storing large
collections of rarely changing data on systems that limit the number of
files that can be stored.
Each dataset directory also contains a 'ria-layout-version' file that
Each dataset directory also contains a ``ria-layout-version`` file that
identifies the data organization (as, for example, described above).
Lastly, there is a global 'ria-layout-version' file at the store's
Lastly, there is a global ``ria-layout-version`` file at the store's
base path that identifies where dataset subdirectories themselves are
located. At present, this file must contain a single line stating the
version (currently "1"). This line MUST end with a newline character.
It is possible to define an alias for an individual dataset in a store by
placing a symlink to the dataset location into an 'alias/' directory
placing a symlink to the dataset location into an ``alias/`` directory
in the root of the store. This enables dataset access via URLs of format:
'ria+<protocol>://<storelocation>#~<aliasname>'.
``ria+<protocol>://<storelocation>#~<aliasname>``.
Compared to standard git-annex object stores, the 'annex/' subdirectories
Compared to standard git-annex object stores, the ``annex/`` subdirectories
used as storage siblings follow a different layout naming scheme
('dirhashmixed' instead of 'dirhashlower').
This is mostly noted as a technical detail, but also serves to remind
Expand All @@ -151,20 +156,20 @@ class CreateSiblingRia(Interface):
difference. Interactions should be handled via the ORA special remote
instead.
Error logging
~~~~~~~~~~~~~
**Error logging**
To enable error logging at the remote end, append a pipe symbol and an "l"
to the version number in ria-layout-version (like so '1|l\\n').
to the version number in ria-layout-version (like so: ``1|l\\n``).
Error logging will create files in an "error_log" directory whenever the
git-annex special remote (storage sibling) raises an exception, storing the
Python traceback of it. The logfiles are named according to the scheme
'<dataset id>.<annex uuid of the remote>.log' showing "who" ran into this
``<dataset id>.<annex uuid of the remote>.log`` showing "who" ran into this
issue with which dataset. Because logging can potentially leak personal
data (like local file paths for example), it can be disabled client-side
by setting the configuration variable
"annex.ora-remote.<storage-sibling-name>.ignore-remote-config".
``annex.ora-remote.<storage-sibling-name>.ignore-remote-config``.
"""

# TODO: description?
Expand Down
2 changes: 1 addition & 1 deletion datalad/distributed/drop.py
Original file line number Diff line number Diff line change
Expand Up @@ -450,7 +450,7 @@ def _fatal_pre_drop_checks(ds, repo, paths, what, reckless, is_annex):

if what in ('all', 'datasets') and not reckless == 'kill':
# we must not have subdatasets anymore
# if we do, --recursive was forgotton
# if we do, --recursive was forgotten
subdatasets = ds.subdatasets(
path=paths,
# we only care about the present ones
Expand Down
8 changes: 8 additions & 0 deletions datalad/distribution/get.py
Original file line number Diff line number Diff line change
Expand Up @@ -787,6 +787,14 @@ class Get(Interface):
submodule commit is available as `remote-<name>` properties, where `name`
is the configured remote name.
Hence, such a template could be `http://example.org/datasets/{id}` or
`http://example.org/datasets/{path}`, where `{id}` and `{path}` would be
replaced by the `datalad-id` or `path` entry in the `.gitmodules` record.
If this config is committed in `.datalad/config`, a clone of a dataset can
look up any subdataset's URL according to such scheme(s) irrespective of
what URL is recorded in `.gitmodules`.
Lastly, all candidates are sorted according to their cost (lower values
first), and duplicate URLs are stripped, while preserving the first item in the
candidate list.
Expand Down
12 changes: 12 additions & 0 deletions datalad/distribution/tests/test_get.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,18 @@ def test_get_flexible_source_candidates_for_submodule(t=None, t2=None, t3=None):
dict(cost=700, name='bang', url='pre-{}-post'.format(sub.id),
from_config=True),
])
# template using the "regular" property `path` (`id` above is shortened from
# actual record `datalad-id` in .gitmodules)
with patch.dict(
'os.environ',
{'DATALAD_GET_SUBDATASET__SOURCE__CANDIDATE__BANG': 'somewhe.re/{path}'}):
eq_(f(clone, clone.subdatasets(return_type='item-or-list')),
[
dict(cost=600, name=DEFAULT_REMOTE, url=ds_subpath),
dict(cost=700, name='bang', url='somewhe.re/sub',
from_config=True),
])

# now again, but have an additional remote besides origin that
# actually has the relevant commit
clone3 = install(
Expand Down
2 changes: 1 addition & 1 deletion datalad/local/tests/test_configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ def test_something(path=None, new_home=None):
name='some.more',
value='test')
# Python tuple specs
# swallow outputs to be able to execise the result renderer
# swallow outputs to be able to exercise the result renderer
with swallow_outputs():
res = ds.configuration(
'set',
Expand Down
2 changes: 1 addition & 1 deletion datalad/support/gitrepo.py
Original file line number Diff line number Diff line change
Expand Up @@ -2353,7 +2353,7 @@ def _parse_gitmodules(self):
continue
modprops = {'gitmodule_{}'.format(k): v
for k, v in props.items()
if not (k.startswith('__') or k == 'path')}
if not k.startswith('__')}
# Keep as PurePosixPath for possible normalization of / in the path etc
modpath = PurePosixPath(props['path'])
modprops['gitmodule_name'] = name
Expand Down
29 changes: 16 additions & 13 deletions datalad/support/strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,14 @@
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##g
"""Variety of helpers to deal with strings"""

__docformat__ = 'restructuredtext'
from __future__ import annotations

__docformat__ = 'restructuredtext'
import re
from typing import AnyStr


def get_replacement_dict(rules):
def get_replacement_dict(rules: AnyStr | list[AnyStr | list[AnyStr] | tuple[AnyStr, AnyStr]]) -> dict[AnyStr, AnyStr]:
"""Given a string with replacement rules, produces a dict of from: to"""

if isinstance(rules, (bytes, str)):
Expand All @@ -23,24 +25,25 @@ def get_replacement_dict(rules):
for rule in rules:
if isinstance(rule, (list, tuple)):
if len(rule) == 2:
pairs.append(rule)
pairs[rule[0]] = rule[1]
else:
raise ValueError("Got a rule %s which is not a string or a pair of values (from, to)"
% repr(rule))
if len(rule) <= 2:
elif len(rule) <= 2:
raise ValueError("")
rule_split = rule[1:].split(rule[0])
if len(rule_split) != 2:
raise ValueError(
"Rename string must be of format '/pat1/replacement', "
"where / is an arbitrary character to decide replacement. "
"Got %s when trying to separate %s" % (rule_split, rule)
)
pairs[rule_split[0]] = rule_split[1]
else:
rule_split = rule[1:].split(rule[0:1])
if len(rule_split) != 2:
raise ValueError(
"Rename string must be of format '/pat1/replacement', "
"where / is an arbitrary character to decide replacement. "
"Got %r when trying to separate %r" % (rule_split, rule)
)
pairs[rule_split[0]] = rule_split[1]
return pairs


def apply_replacement_rules(rules, s):
def apply_replacement_rules(rules: AnyStr | list[AnyStr | list[AnyStr] | tuple[AnyStr, AnyStr]], s: AnyStr) -> AnyStr:
r"""Apply replacement rules specified as a single string
Examples
Expand Down
28 changes: 28 additions & 0 deletions datalad/typing.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# emacs: -*- mode: python; py-indent-offset: 4; tab-width: 4; indent-tabs-mode: nil -*-
# ex: set sts=4 ts=4 sw=4 et:
# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
#
# See COPYING file distributed along with the datalad package for the
# copyright and license terms.
#
# ## ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##

import sys
from typing import TypeVar

if sys.version_info >= (3, 10):
from typing import ParamSpec
else:
from typing_extensions import ParamSpec

if sys.version_info >= (3, 8):
from typing import Literal
else:
from typing_extensions import Literal

__all__ = ["Literal", "ParamSpec", "T", "K", "V", "P"]

T = TypeVar("T")
K = TypeVar("K")
V = TypeVar("V")
P = ParamSpec("P")
Loading

0 comments on commit 03f1433

Please sign in to comment.