Skip to content

Commit

Permalink
Merge pull request #5 from openzim/complete_v1
Browse files Browse the repository at this point in the history
  • Loading branch information
benoit74 authored Feb 5, 2024
2 parents 456a3f1 + 2a53c06 commit b3a821f
Show file tree
Hide file tree
Showing 7 changed files with 279 additions and 79 deletions.
118 changes: 76 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,22 +33,22 @@ readme = "README.md"
# and version plugins.
dynamic = ["authors", "classifiers", "keywords", "license", "version", "urls"]

# Enable the hatch-openzim metadata hook to generate dependencies from addons manifests.
[tool.hatch.metadata.hooks.openzim]
additional-keywords = ["awesome"] # some additional
# Enable the hatch-openzim metadata hook to generate default openZIM metadata.
[tool.hatch.metadata.hooks.openzim-metadata]
additional-keywords = ["awesome"] # some additional keywords
kind = "scraper" # indicate this is a scraper, so that additional keywords are added

# Enable the hatch-openzim build hook to generate dependencies from addons manifests.
[tool.hatch.build.hooks.openzim]
# Enable the hatch-openzim build hook to install files (e.g. JS libs) at build time.
[tool.hatch.build.hooks.openzim-build]
toml-config = "openzim.toml" # optional location of the configuration file
dependencies = [ "zimscraperlib==3.1.0" ] # optional dependencies needed for file installations
```

## Metadata hook usage
NOTA: the `dependencies` attribute is not specific to our hook(s), it is a generic [hatch(ling) feature](https://hatch.pypa.io/1.9/config/build/#dependencies_1).

The build hook configuration is done in a file named `openzim.toml` (if not customized)
which must be placed in the root folder, next to your `pyproject.toml`.
## Metadata hook usage

### Configuration
### Configuration (in `pyproject.toml`)

| Variable | Required | Description |
|---|---|---|
Expand All @@ -70,44 +70,54 @@ The metadata hook will set:
- all `Programming Language :: Python :: x` and `Programming Language :: Python :: x.y` matching the `required-versions`
- `keywords` will contain:
- at least `kiwix`
- if `kind` is `scraper`, ...
- if `kind` is `scraper`, it will add `zim` and `offline`
- and `additional-keywords` passed in the configuration
- `license` to `{"text": "GPL-3.0-or-later"}``
- `license` to `{"text": "GPL-3.0-or-later"}`
- `urls` to
- `"Donate": "https://www.kiwix.org/en/support-us/"`
- `"Homepage": "https://github.com/openzim/hatch-openzim"`
- `Donate`: `https://www.kiwix.org/en/support-us/`
- `Homepage`: Github repository URL (e.g. `https://github.com/openzim/hatch-openzim`) if code is a git clone, otherwise `https://www.kiwix.org`


## Build hook usage

The build hook configuration is done in a file named `openzim.toml` (if not customized)
which must be placed in the root folder, next to your `pyproject.toml`.

### Configuration
### High-level configuration (in `pyproject.toml`)

| Variable | Required | Description |
|---|---|---|
| `toml-config` | N | Location of the configuration |
| `toml-config` | N | Location of the configuration, default to `openzim.toml` |

### Files installation
### Details configuration (in `openzim.toml`)

The build hook detailed configuration is done in a TOML file named `openzim.toml` (if not customized
via `toml-config`, see above). This file must be placed your project root folder, next to your
`pyproject.toml`.

The build hook supports to download web resources at various location at build time.

To configure, this you first have to create a `files` section in the configuration and
declare its `config` configuration. Name of the section (`assets` in example below) is
To configure, this you first have to create a `files` section in the `openzim.toml` configuration
and declare its `config` configuration. Name of the section (`assets` in example below) is
free (do not forgot to escape it if you want to use special chars like `.` in the name).

```toml
[files.assets.config]
target_dir="src/hatch_openzim/templates/assets"
execute_after=[
"touch somewhere/something.txt"
]
```

Configuration:
| Variable | Required | Description |
|---|---|---|
| `target_dir` | Y | Base directory where all downloaded content will be placed |
| `execute_after` | N | List of shell commands to execute once all actions (see below) have been executed; actions are executed with `target_dir` as current working directory |

**Important:** The `execute_after` commands are **always** executed, no matter how many action are
present or how many actions have been ignored (see below for details about why an action might be ignored).

- `target_dir`: Base directory where all downloaded content will be placed
Nota: The example `execute_after` command (`touch`) is not representative of what you would usually do ^^

Once this section configuration is done, you will then declare multiple action. All
actions in a given section share the same base configuration
Once this section configuration is done, you will then declare multiple actions. All
actions in a given section share the same base configuration declared above.

Three kinds of actions are supported:

Expand All @@ -122,13 +132,18 @@ Each action is declared in its own TOML table. Action names are free.
action=...
```

### `get_file` action
### `get_file` action configuration (in `openzim.toml`)

This action downloads a file to a location.

- `action`: "get_file"
- `source`: URL of the online resource to download
- `target_file`: relative path to the file
**Important:** If `target_file` is already present, the action is not executed, it is simply ignored.

| Variable | Required | Description |
|---|---|---|
| `action` | Y | Must be "get_file" |
| `source`| Y | URL of the online resource to download |
| `target_file` | Y | Relative path to the file target location, relative to the section `target_dir` |
| `execute_after` | N | List of shell commands to execute once file installation is completed; actions are executed with the section `target_dir` as current working directory |

You will find a sample below.

Expand All @@ -139,15 +154,21 @@ source="https://code.jquery.com/jquery-3.5.1.min.js"
target_file="jquery.min.js"
```

### `extract_all` action
### `extract_all` action configuration (in `openzim.toml`)

This action downloads a ZIP and extracts it to a location. Some items in the Zip content
can be removed afterwards.

- `action`: "extract_all"
- `source`: URL of the online ZIP to download
- `target_dir`: relative path of the directory where ZIP content will be extracted
- `remove`: Optional - list of glob patterns of ZIP content to remove after extraction
**Important:** If `target_dir` is already present, the action is not executed, it is simply ignored.

| Variable | Required | Description |
|---|---|---|
| `action` | Y | Must be "extract_all" |
| `source` | Y | URL of the online ZIP to download |
| `target_dir` | Y | Relative path of the directory where ZIP content will be extracted, relative to the section `target_dir` |
| `remove` | N | List of glob patterns of ZIP content to remove after extraction (relative to action `target_dir`) |
| `execute_after` | N | List of shell commands to execute once files extraction is completed; actions are executed with the section `target_dir` as current working directory |


You will find a sample below.

Expand All @@ -162,20 +183,25 @@ target_dir="chosen"
remove=["docsupport", "chosen.proto.*", "*.html", "*.md"]
```

### `extract_items` action
### `extract_items` action configuration (in `openzim.toml`)

This action extracts a ZIP to a temporary directory, and move selected items to some locations.
Some sub-items in the Zip content can be removed afterwards.

- `action`: "extract_all"
- `source`: URL of the online ZIP to download
- `zip_paths`: list of relative path in ZIP to select
- `target_paths`: relative path of the target directory where selected items will be moved
- `remove`: Optional - list of glob patterns of ZIP content to remove after extraction (must include the target paths)
**Important:** If any `target_paths` is already present, the action is not executed, it is simply ignored.

| Variable | Required | Description |
|---|---|---|
| `action`| Y | Must be "extract_all" |
| `source`| Y | URL of the online ZIP to download |
| `zip_paths` | Y | List of relative path in ZIP to select |
| `target_paths` | Y | Relative path of the target directory where selected items will be moved (relative to ZIP home folder) |
| `remove` | N | List of glob patterns of ZIP content to remove after extraction (must include the target paths, they are relative to the section `target_dir`) |
| `execute_after` | N | List of shell commands to execute once ZIP extraction is completed; actions are executed with the section `target_dir` as current working directory |

Nota:
- the `zip_paths` and `target_paths` are match one-by-one, and must hence have the same length.
- the ZIP is first save to a temporary location before extraction, consuming some disk space
- the `zip_paths` and `target_paths` are matched one-by-one, and must hence have the same length.
- the ZIP is first saved to a temporary location before extraction, consuming some disk space
- all content is extracted before selected items are moved, and the rest is deleted

You will find a sample below.
Expand All @@ -193,14 +219,22 @@ remove=["ogvjs/COPYING", "ogvjs/*.txt", "ogvjs/README.md"]

A full example with two distinct sections and three actions in total is below.

Nota: The `touch` command in `execute_after` is not representative of what you would usually do ^^

```toml
[files.assets.config]
target_dir="src/hatch_openzim/templates/assets"
execute_after=[
"fix_ogvjs_dist .",
]

[files.assets.actions."jquery.min.js"]
action="get_file"
source="https://code.jquery.com/jquery-3.5.1.min.js"
target_file="jquery.min.js"
execute_after=[
"touch done.txt",
]

[files.assets.actions.chosen]
action="extract_all"
Expand Down
2 changes: 1 addition & 1 deletion src/hatch_openzim/build_hook.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ class OpenzimBuildHook(BuildHookInterface):
- files installation
"""

PLUGIN_NAME = "openzim"
PLUGIN_NAME = "openzim-build"

def initialize(self, version, build_data): # noqa: ARG002
if "toml-config" in self.config:
Expand Down
42 changes: 38 additions & 4 deletions src/hatch_openzim/files_install.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import shutil
import subprocess
import tempfile
import zipfile
from pathlib import Path
Expand Down Expand Up @@ -60,6 +61,10 @@ def _process_section(section_name: str, section_data: Dict[str, Any]):
for action_name, action_config in section_actions.items():
_process_one_action(base_target_dir, action_name, action_config)

execute_after = section_config.get("execute_after", None)
if execute_after:
_process_execute_after(base_target_dir=base_target_dir, actions=execute_after)

logger.info(" All done")


Expand Down Expand Up @@ -93,9 +98,30 @@ def _process_one_action(
else:
raise Exception(f"Unsupported action '{action}'")

execute_after = action_data.get("execute_after", None)
if execute_after:
_process_execute_after(base_target_dir=base_target_dir, actions=execute_after)

logger.info(" Done")


def _process_execute_after(base_target_dir: Path, actions: List[str]):
"""execute actions after file(s) installation"""

for action in actions:
logger.info(f" Executing '{action}'")
process = subprocess.run(
action,
shell=True, # noqa: S602
cwd=base_target_dir,
text=True,
check=True,
capture_output=True,
)
if process.stdout:
logger.info(f" stdout:\n{process.stdout}")


def _process_get_file_action(
base_target_dir: Path, source: str, action_data: Dict[str, Any]
):
Expand All @@ -109,7 +135,9 @@ def _process_get_file_action(
local_dir = base_target_dir / str(target_dir)
local_dir.mkdir(parents=True, exist_ok=True)
local_file = local_dir / str(target_file)
local_file.unlink(missing_ok=True)
if local_file.exists():
logger.info(" Skipping, local_file is already present")
return
_download_file(source, local_file)


Expand All @@ -125,7 +153,8 @@ def _process_extract_all_action(
raise Exception("target_dir is mandatory when action='extract_all'")
target_dir = base_target_dir / str(target_dir)
if target_dir.exists():
shutil.rmtree(target_dir)
logger.info(" Skipping, target_dir is already present")
return
if not target_dir.parent.exists():
target_dir.parent.mkdir(parents=True, exist_ok=True)
_extract_zip_from_url(url=source, extract_to=target_dir)
Expand Down Expand Up @@ -153,13 +182,18 @@ def _process_extract_items_action(
f" {len(target_paths)})"
)

# do not re-install if asset has already been installed
if any(
(base_target_dir / str(target_path)).exists() for target_path in target_paths
):
logger.info(" Skipping, at least one target path is already present")
return

with tempfile.TemporaryDirectory() as tempdir:
_extract_zip_from_url(url=source, extract_to=tempdir)
for index, zip_path in enumerate(zip_paths):
item_src = Path(tempdir) / str(zip_path)
item_dst = base_target_dir / str(target_paths[index])
if item_dst.is_dir(): # will check if it exists as well
shutil.rmtree(item_dst)
shutil.move(src=str(item_src), dst=item_dst)

if "remove" in action_data:
Expand Down
2 changes: 1 addition & 1 deletion src/hatch_openzim/metadata_hook.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ class OpenzimMetadataHook(MetadataHookInterface):
- project urls
"""

PLUGIN_NAME = "openzim"
PLUGIN_NAME = "openzim-metadata"

def update(self, metadata: dict):
"""Update the project table's metadata."""
Expand Down
11 changes: 11 additions & 0 deletions tests/configs/execute_after_failure.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[files.part1.config]
target_dir="part1"

[files.part1.actions.action1]
action="extract_all"
source="https://tmp.kiwix.org/ci/hatch_openzim_testsets/testset1.zip"
target_dir="action1"
remove=["remove1", "remove2.txt", "remove3/file1.txt"]
execute_after=[
"touch somewhere/something.txt"
]
17 changes: 16 additions & 1 deletion tests/configs/full.toml
Original file line number Diff line number Diff line change
@@ -1,11 +1,20 @@
[files.part1.config]
target_dir="part1"
execute_after=[
"mkdir -p somewhere_else",
"touch somewhere_else/something.txt",
]

[files.part1.actions.action1]
action="extract_all"
source="https://tmp.kiwix.org/ci/hatch_openzim_testsets/testset1.zip"
target_dir="action1"
remove=["remove1", "remove2.txt", "remove3/file1.txt"]
execute_after=[
"mkdir -p somewhere",
"touch somewhere/something.txt",
"ls -lah",
]

[files.part2.config]
target_dir="part2"
Expand All @@ -23,6 +32,12 @@ source="https://tmp.kiwix.org/ci/hatch_openzim_testsets/testset2.zip"
zip_paths=["keep1"]
target_paths=["action3"]

[files.part2.actions.action4]
action="extract_items"
source="https://tmp.kiwix.org/ci/hatch_openzim_testsets/testset2.zip"
zip_paths=["keep1/file1.txt"]
target_paths=["file123.txt"]

# part without any actions
[files.part3.config]
target_dir="part3"
Expand All @@ -44,4 +59,4 @@ target_dir="action2"
[files.part4.actions.action3]
action="extract_all"
source="https://tmp.kiwix.org/ci/hatch_openzim_testsets/testset1.zip"
target_dir="subdir1/action3"
target_dir="subdir1/action3"
Loading

0 comments on commit b3a821f

Please sign in to comment.