Skip to content

Commit

Permalink
more work
Browse files Browse the repository at this point in the history
  • Loading branch information
Aklakan committed Feb 17, 2024
1 parent 9f0ebbc commit 701bdcd
Show file tree
Hide file tree
Showing 4 changed files with 101 additions and 7 deletions.
4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ nav_order: 10

## Synopsis

This documentation is a guide for how to design Maven projects that can **publish data** with an invocation of:
This documentation is a guide for how to design Maven projects that can **generate** and **publish** data with an mere invocation of:

```bash
mvn deploy
Expand All @@ -19,7 +19,7 @@ This guide presents concepts for one-shot data publishing as well as for automat
## Why not ...

* **use a Workflow Engine?**
There are just too many workflow engines out there, and tying builds to one of them significantly limits portability.
There exist many workflow engines, and tying builds to one of them significantly limits portability.
Conversely, every workflow engine is expected to be capable of running a shell script and thus invoke a Maven build.

* **use a different build system rather than Maven?**
Expand Down
69 changes: 69 additions & 0 deletions docs/sync/change-detection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
title: Maven Repository Events
layout: home
nav_order: 10
---


# Maven Repository Change Detection

## Synopsis

This page lists approaches about how to detect changes to artifacts in a Maven repository.

## Purpose

The ability to detect changes to a Maven repository allows one to automate several processes:

* Meta-data generation: Automatically generate VoID, DCAT and PROV-O models from uploaded datasets.
* RDF-Store synchronization: Automatically update an RDF store with information about the Maven repository. Effectively enables querying Maven repositories with Data.

## Approaches

### inotifywait

`inotifywait` is a command line tool capable of watching directories recursively for changes.
It works well on linux distributions - even when invoked within docker containers on a folder mounted from the host.

The following line will report any closing of a file (after write) or deletion in the repository in the format `EVENTS PATH/FILENAME`:
```
inotifywait "$HOME/.m2/repository" --recursive --monitor --format '%e %w%f' --event CLOSE_WRITE --event DELETE
```

Example output:
```
CLOSE_WRITE,CLOSE /home/user/.m2/repository/org/example/myproject/myartifact/1.0.0-SNAPSHOT/myartifact-1.0.0-SNAPSHOT.jar
```

### Repository Manager Hooks and Plugins

Repository Managers may either directly support hooks or they may allow for third party plugins that could supply this functionality.

* Archiva?
* Artifactory?

Contributions to this section are welcome.

### Polling

For completeness, the following script sketches for how one could detect changes to a directory using polling.
The polling could be triggered periodically by a CRON job.

```
# Ensure before.txt exists
touch before.txt
# Generate a recursive directory listing
find "$MVN_REPO" | LC_ALL=C sort -u > current.txt
# Compute a diff
diff current.txt before.txt
# Process the diff
# Move current state to prior state
mv current.txt before.txt
```



15 changes: 10 additions & 5 deletions docs/sync/index.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,27 @@
---
title: Synchronizing Metadata
title: Maven-RDF Sync
layout: default
nav_order: 200
---

## Autogenerating RDF metadata for a Maven Repository.
TODO Refactor mvn-rdf-sync into pages for: artifact change events, metadata generation and triple store sync.

### Summary
# Synchronizing a Triple Store with Maven Repository Data

## Synopsis

* This chapter presents a lightweight trigger-based approach to realize "build actions" (or "bots") over local maven repositories. A local maven repository is simply a certain directory structure.
These bots can be used to automatically create new maven projects for producing RDF metadata.

### Purpose
## Purpose

* Metadata artifacts are just plain maven artifacts whose content describes another artifact.
* A large part of RDF metadata generation is agnostic of the content of a dataset and can be fully automated. In those cases, a user should not need to manually set up metadata projects.

### Abstract Approach
## Abstract Approach

The `mvn-rdf-sync` approach comprises two separate processes:


1. A *file system watch* on a (local) maven repository notifies raises events whenever the repository content changes.
2. The event is transmitted to an appropriate receiver.
Expand Down
20 changes: 20 additions & 0 deletions docs/sync/messaging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
title: Messaging
layout: home
nav_order: 10
---

# Messaging

## Synopsis

Once changes to a maven repository [are detected](change-detection.md), appropriate messages need to be sent out and relevant components need to be notified.

## Purpose

A message queuing system helps to decouple message producers from receivers and adds fault tolerance, such as prevention of data loss in case of server crashes.

## Apache Kafka

Apache Kafka is popular, easy to use, and also works from the command line. we just point to the [Quick Start Guide](https://kafka.apache.org/quickstart).

0 comments on commit 701bdcd

Please sign in to comment.