From 9b49eb1cd7649d06e36f5f9b0146aa477593be1d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tobias=20B=C3=BClte?= Date: Mon, 7 Aug 2023 17:17:24 +0200 Subject: [PATCH 01/69] Add pages for new documentation #25 --- ...ching a transformation with metafacture.md | 15 ++ Fix user guide.md | 216 ++++++++++++++++++ Flux user guide.md | 126 ++++++++++ Framework User Guid.md | 83 +++++++ Getting Started.md | 51 +++++ Home.md | 42 ++++ 6 files changed, 533 insertions(+) create mode 100644 Approaching a transformation with metafacture.md create mode 100644 Fix user guide.md create mode 100644 Flux user guide.md create mode 100644 Framework User Guid.md create mode 100644 Getting Started.md create mode 100644 Home.md diff --git a/Approaching a transformation with metafacture.md b/Approaching a transformation with metafacture.md new file mode 100644 index 0000000..0a4d34a --- /dev/null +++ b/Approaching a transformation with metafacture.md @@ -0,0 +1,15 @@ +Every approach transform metadata with metafacture is quite similiar: + +- You need to know what you want: + e.g. Transform data from Marc21 from a certain folder to some kind of JSON Data. +- You have to identify the commands that you need. +- Combine the commands without the transformation module and test if the workflow goes trough. +- Adjust the workflow until it works. +- If the general workflow is working, move on to prepare the transformation. +- Get familar with the incoming data: + - e.g. use `| list-fix-paths| print` to checkout the metadata-element paths that are provided. + - use `| list-fix-values ("specifiedElementPath")| print` to get all element values of a certain element +- Start to write your transformation successivly and `write` to a specific or `print` the result. + - Start with one element that you want to transform an retain it. + - If you are happy with the result continue. +- If you have finalized your transformation include it in your application or transform the data you want. \ No newline at end of file diff --git a/Fix user guide.md b/Fix user guide.md new file mode 100644 index 0000000..6215efa --- /dev/null +++ b/Fix user guide.md @@ -0,0 +1,216 @@ +This document provides an introduction to the Metafacture Fix language (short: Metafix or Fix). Fix is a declarative flow oriented language in which transformations of arbitrary metadata/semi-structured data can be defined using the FIX language. The Fix language for Metafacture is introduced as an alternative to configuring data transformations with Metamorph. Inspired by Catmandu Fix, Metafix processes metadata not as a continuous data stream but as discrete records. The basic idea is to rebuild constructs from the (Catmandu) Fix language like functions, selectors and binds in Java and combine with additional functionalities from the Metamorph toolbox. + +## Part of a metafacture worflow +Metafacture Fix is a transformation module that can be used in a workflow, for this you have to use this in your pipeline: +- when using the FLUX: +- - address the `fix`-module +- - +```PERL + infile + | open-file + | as-lines + | decode-marc21 + | fix(FLUX_DIR + "fixFile.fix") + | encode-json + | print + ; + ``` +- - you can add variables +- - there are some optiones available +- - The fix-Transformation can be part of the FLUX `|fix("retain(`245??`)")` - usually useful for short fixes +- - or it can be separated in a external file, that is called in the FLUX-Process as in the code-snipped above +- when using it in a JAVA-Process, just add the library to your process + +## Record-based and metadata manipulating approach +While Metafature processes the data as a stream, the fix module does not it buffers the incoming stream to distinct records. +So that you can manipulate all metadata-elements of a record at once and without focussing about the order of the incoming stream - which was a really big hassle in the stream-based MORPH. +The incoming record then can be manipulated, fields can be changed, removed or added. This also differs from the approach in MORPH where you construct a new record and a new data stream, whereas you change stuff in the record in FIX and "only" change the data stream in Metafacture. + + +## Basic concepts +The four main concepts of FIX (introduced by catmandu) are [functions](https://librecat.org/Catmandu/#functions), [selector](https://librecat.org/Catmandu/#selectors), [conditionals](https://librecat.org/Catmandu/#conditionals) and [binds](https://librecat.org/Catmandu/#binds). The following code snippet shows examples of eachs of these concepts: + + +```PERL + +# Simple fix function + +add_field("hello", "world") +remove_field("my.deep.nested.junk") +copy_field("stats", "output.$append") + +# Conditionals + +if exists("error") + set_field("is_valid", "no") + log("error") +elsif exists("warning") + set_field("is_valid", "yes") + log("warning") +else + set_field("is_valid", "yes") +end + +# Binds - Loops + +do list(path: "foo", "var": "$i") + add_field("$i.bar", "baz") +end + +# Selector +if exists("error") + reject() +end + +``` + +**Functions** are used to add, change, remove or otherwise manipulate elements. + +**Conditionals** are used to control the processing of function so that they are not process with every workflow but only under certain conditions. + +**Selectors** can be used as hghlevel filter to filter the records you want. + +**Binds** are wrappers for one or more fixes. They give extra control functionality for fixes such as loops. +All binds have the syntax: +```PERL +do Bind(params,…) + fix(..) + fix(..) +end +`````` + +For a list of all function, selectors, binds and conditionals have a look at: +https://github.com/metafacture/metafacture-core/wiki#functions-and-cookbook + +## Addressing Pieces of Data - of: FIX-Path and the record structure in FIX + +Internally FIX knows arrays, objects/hashes and simple elements. How a format is translated is depending on the `decode-...` command in the MF Workflow. Only one thing is specific to the fix, as in Catmandu a repeated field is translated into a list depending on the real input data of the single record and elements with the suffix `[]` are interpreted as arrays. + +Since function manipulate, add or remove elements in a record, it is essential to understand the way on can adress source or target elements. + +[e.g.: +```PERL +copy_field("", "") +``` +] +To adress the source or target element here, you need to provide the path to the element. +Metafacture Fix is using a path-syntax that is JSON Path like but not identical. It also uses the dot notation but there are some differences with the path structure of arrays and repeated fields. Especially when working with JSON, YAML or records repeated fields. + +``` +a : simpleField +b : c : objectField1 + d : objectField2 + e : objectField3 +f : repeatedField1 +f : repeatedField2 +f : repeatedField3 +g : - listElement1 + - listElement2 + - listElement3 +h : - i : listObjectElement1.1 + j : listObjectElement1.2 + - i : listObjectElement2.1 + j : listObjectElement2.2 +k : l : m : o : deepNestedField +``` + +The path for a simple string-element is adressed by stating the element name: `a` +For the fields with deeper structure you add a dot ‘.’. The path for elements in nested objects is stated by: `b.c` or `k.l.m.o` + +In an data set an element sometimes an element can have multiple instances. Different data models solve this possibility differently. XML-Records can have all elements multiple times, element repition is possible and in many schemas (partly) allowed. Repeatable elements also exist in JSON and YAML but are unusual. + +To point to a specific element you state the index number. To adress the value `repeatedField2` the path would be `f.2`` since the repeated field is handled as list. +Similar you adress the `listElement3` of the array/list by `g[].3`. The brackets are an array indicator created by the flux command decode-yaml. It helps to interpret an element as array also if the list has only on value. + +Working with nested structures and combination of arrays and objects the path is a combination of element names, dots and index numbers. + +`listObjectElement2.2` has the path: `h[].2.j` + +You do not only need the path name for your source element but also if you want to create a new element. But remember that fix as in catmandu is using repeated fields and arrays as lists so if you want to create a repeated field you have to create an array without suffic []. + +e.g.: +```PERL +copy_field("a", "z.y.x") +``` + +This would copy the value of z in a nested object: + +``` +z : + y: + x : simpleField +``` + + +To adress paths you can use wildcards. For instance the star-wildcard: `person*` would match all simple literals with element names starting with 'person': 'person\_name', 'person\_age', etc. +Apart from the star-wildcard, the questionmark-wildcard ('?') is supported. It matches exactly one arbitrary character. + +Not yet supported is alteration. + +Besides path wildcards there are array/list wildcards that are used to refrence specific elements or all elements in an array. `g[].*` adresses all strings in the array `g[]`. `g[].$append` would refrence a new element in the array at the end of the array. `g[].$last` refrences the last element in an array. + +## Cleaning up the transformation + +Since FIX is not constructing a new record stream but is manipulating the existing record you usually clean up after you transform the data. There are functions to kick out all unnecessary elements an kick out all empty elements. + +e.g.: if you transform MARC21 to JSON but you want to keep only certain elements that you created. you state them in a retain function: + +``` +retain("all", + element", + "that", + "I", + "want") +``` +This function only keeps all the elements that I wanted. At the moment this only works with highlevel elements. + +`vacuum()` deletes all emtpy elements. + +## Defining Macros + +Macros can be defined with the `put_macro`-Bind and use the same parameter +mechanism later. +Macros are called with the `call_macro` function. Attributes +of the function are used as parameters: + +```PERL +do put_macro("concat-up") + set_array("$[target_field]") + copy_field("$[source_field]","$[target_field].$append") + case("$[target_field].*") + join_field("$[target_field]",", ") +end + + + +call_macro("concat-up", source_field:"data1", target_field:"Data1") +call_macro("concat-up", source_field:"data2", target_field:"Data2") +`````` + +In this case `target_field` and `source_field` serve as a parameter (the name is arbitrary). In the macro definition itsel, the parameters are addressed by `$[target_field]` and `$[source_field]`. + +Parameters are scoped, which means that the ones provided with the `call_macro` function shadow global ones. Macros cannot be nested. + + +## Splitting Fixes for Reuse + +In a complex project setting there may be several Fixes in use, +and it is likely that they share common parts. Imagine for instance a +transformations from Marc 21 record holding data on books to RDF, and Marc 21 +records hodling data on authors to RDF. Both make use of a table assinging +country names to ISO country codes. Such a table should only exist once. + +Another scenario would be to reduce the size of a single fix file and create several fix files used for different purposes. + +To accomodate for such reuse, Fix offers an include mechanism: + +`````` +# Setup adds maps, macros and vars once +do once("setup") + include ("./fix/maps.fix") + include ("./fix/macros.fix") + put_var("member", "-") +end +``` + +For perfomance reason it is useful to integrate macros and maps that are used often in an do once bind. \ No newline at end of file diff --git a/Flux user guide.md b/Flux user guide.md new file mode 100644 index 0000000..6e12d61 --- /dev/null +++ b/Flux user guide.md @@ -0,0 +1,126 @@ +This document provides a quick introduction to Metafacture Flux, a domain specific language to build data flows for metadata processing. +The Flux makes use of Metafacture as a stand-alone application - so you build workflows without the need of writing java code. + + + +# Installing Flux and Running Flux + +# There are different ways of installing and running Metafacture FLUX +## Stand-alone application (without Java Code) + +Either use a prebuild distribution by unziping the Metafacture distribution archive. With regard of using FIX we advise to use the Runner provided by the Metafacture Fix repo. See [releases page](https://github.com/metafacture/metafacture-fix/releases) + +Then execute the script `flux.sh` or `flux.bat` in the unzipped `bin/` folder. + +## More elaborate ways for developers: + +### Build from local distribution +Check out the repo to build a certain branch and roll your own local distribution like this: +```bash +$ cd metafacture-core; ./gradlew installDist +``` + +Then go to `metafacture-core/metafacture-runner/build/install/metafacture-core` and execute the `flux.*` there. + +### Working with the Java source code + +If you are working with the source code directly, execute the class `org.metafacture.runner.Flux`. + +## Run a Flux-File + +Just provide the flux file you wish to run as first argument. + +```bash +$> flux.sh FILE.flux +``` + +## Provide Arguments +To provide arguments add variable assignments after the first argument as follows: +```bash +$> flux.sh FILE.flux var1=value1 var2=value2 +``` +This sets the variable `var1` to the value 'value1' and `var2` to the value 'value2'. + +# Writing Flux files +The following snippet shows a simple flux file: +```c +//declare variables +default file = FLUX_DIR + "10.marc21"; + +//declare flow +file +| open-file +| as-lines +| decode-marc21 +| fix(FLUX_DIR + "fix-marc21.fix") +| encode-json(prettyPrinting="true") +| write("stdout") +; +``` +In the first section variables are declared, in the second, we define the flow. +Linebreaks are optional. Semicolons `;` mark the end of a variable assignment or flow definition. + +[List of available FLUX commands.](https://github.com/metafacture/metafacture-documentation/blob/master/flux-commands.md) + +## Variables +Variables are always Strings and can be concatenated with the `+` operator. Escape sequences follow the Java String conventions: `\n`=line break, `\t`=tab, `\\`=\, `\u0024`=unicode character, etc. + +The `default` keyword tells Flux to assign the respective value _only_ if the variable has +not yet been set on the command line. Without `default`, previous assignments will be overwritten. + +Paths are always relative to the directory within which the flux command is executed. To address files relative to the location of the executed flux file, use the predefined `FLUX_DIR` variable. + +## Comments +Flux supports single line C/Java-style comments: `//comment`. + +## Flow Definitions + +A FLUX contains multiple command-moduls that are doing specific things. E.g.: + +```C +"file/path.mrc" +| open-file -> This opens the file of the provided `file/path.mrc`-path. +| as-lines - This reads the file by lines. +| decode-marc21 -> This decodes the data as binary marc21 into an internal format. +| fix(FLUX_DIR + "fix-marc21.fix") -> This executes the provided Fix-transformation. +| encode-json(prettyPrinting="true") -> This encodes the transformed data as JSON-Format. +| write("stdout") -> This writes the Json-Data to standard output. +; +`````` + +The syntax for defining flows takes its cues from bash pipes. Commands are concatenated with the pipe character `|`. + +Some commands take a constructor argument. It is provided within brackets: `command("arg")`. +Furthermore, some commands have named options. These are set as follows `command(optionname="arg1",annotheroption="arg2")` or with constructor argument: `command("arg",option="arg2")`. +To learn about the available options of a command, execute Flux without arguments: It will list all available commands, including options. or have a look at: [List of available FLUX commands.](https://github.com/metafacture/metafacture-documentation/blob/master/flux-commands.md) TODO: We need to add FIX to that list!!! + + +To some commands, the entire environment can be given as argument. This is done with the `*` character: `fix("tranformation.fix", *)`. In this case Metafix gains access to all variable assignments made in Flux. +(See also [[Metafix-User-Guide#parameters-to-metafix-definitions]]). + +Note that unlike shell pipes, the data flowing between Flux commands is _typed_. This means that only commands with matching signatures can be combined. Commands expect a certain input and provide a certain output like: `StreamReceiver, `Object`, `Reader` and others. + +To lookup the signatures, execute Flux without arguments or see: [[Metafix-User-Guide#parameters-to-metafix-definitions]]). It will list all available commands, including signatures. + + + +## Getting Help and Inspiration (TODO: Ersetzen.) +1. Have a look at the [List of available FLUX commands.](https://github.com/metafacture/metafacture-documentation/blob/master/flux-commands.md) or if the flux executed without arguments, Flux will display a short help text along with a list of all registered commands. This is the list of FLUX commands mentioned already above. +2. There are several example flux files along with sample data in the folder `examples/`: https://github.com/metafacture/metafacture-core/tree/master/metafacture-runner/src/main/dist/examples + +_________________________ +# For developers: + +## Adding new Commands +Add your class and a descriptive flux shortcut to `flux-commands.properties`. This file acts as a lookup table for flux commands. Use the proper file, i.e. the one residing in the same module where your newly created class resides. If you have e.g. created a class in the module `metafacture-biblio`, you add the flux-command to https://github.com/metafacture/metafacture-core/blob/master/metafacture-biblio/src/main/resources/flux-commands.properties. +Recompile. That's all to add a command. + +However it's good practice to also add some annotations to the java class so that IDEs (and also humans) can pickup some hints what the new command can do, what type of input is allowed and what type of output is computed. Thus you know what commands can be chained together in a pipe. +There are 4 annotations, see this [example](https://github.com/metafacture/metafacture-core/blob/master/metafacture-biblio/src/main/java/org/metafacture/biblio/AlephMabXmlHandler.java): +``` +@Description("A MAB XML reader") +@In(XmlReceiver.class) +@Out(StreamReceiver.class) +@FluxCommand("handle-mabxml")morph +``` +If you add a command it would be nice if you also add a flux example to the module `metafacture-runner` so that users can easily see how it's used, see e.g. https://github.com/metafacture/metafacture-core/blob/master/metafacture-runner/src/main/dist/examples/read/regexp/regexp.flux. diff --git a/Framework User Guid.md b/Framework User Guid.md new file mode 100644 index 0000000..b9e16b5 --- /dev/null +++ b/Framework User Guid.md @@ -0,0 +1,83 @@ +This page explains how to create a Metafacture objects and how to assemble them to form a processing pipeline. We use as an example a simple pipeline containing a Metamorph instance. + + +# Building a Flow + +A Flow consists of a data source, an arbitrary number of pipe elements and finally a data sink. +The individual elements are connected by calling the `setReceiver()` method. The following code snipped shows an example. + +```java +// create necessary objects +final PicaReader reader = new PicaReader(); +final Metafix metafix = new Metafix("defnition.fix"); +final ListMapWriter writer = new ListMapWriter(); + +//connect them +reader.setReceiver(fix).setReceiver(writer); + +//start processing +reader.read(input); +``` + +Note that the call `setReceiver()` returns +its argument, preserving the respective type. Thus the calls can be chained to +build up a pipeline as shown in the listing. Finally the processing is started +by calling the respective method on the data source/reader. The method name +depends on the reader. In the Metamorph project `read()` is used by +convention. + +The following code snippet shows a few more sophisticated connection patterns, such +as adding an additional element, junctions or splitters. + +```java +//adding logging +reader.setReceiver(new LogPipe()).setReceiver(metafix).setReceiver(writer); + +//adding a tee junction +reader.setReceiver(new Tee()).setReceivers(writer1, writer2); + +//splitting based on a metamorph description +final Splitter splitter = new Splitter("morph/typeSplitter.xml"); +reader.setReceiver(splitter).setReceiver("Tn", writer1); +splitter.setReceiver("Tp", writer2); +``` + +# Piping different Objects + + + +# Objects as Eventstream + +```java +public interface StreamSender { + R setReceiver(R streamReceiver); +} +``` + +```java +public interface StreamReceiver { + void startRecord(String identifier); + void endRecord(); + void startEntity(String name); + void endEntity(); + void literal(String name, String value); +} +``` + +# Error Handling +If an exception occurs during the processing of a stream of records, it is back +propagated to the first element in the chain. This normally means that +processing is terminated which may not be the preferred action. Imagine +processing a million records. One normally prefers to log any error but continue +the processing. +For this reason an error handler may be registered with the Metamorph object. It +catches all exceptions occurring in the Metamorph object and below. + +```java +metamorph.setErrorHandler(new MetamorphErrorHandler() { + @Override + public void error(final Exception e) { + // TODO fill in your error handling code + } +}); +``` \ No newline at end of file diff --git a/Getting Started.md b/Getting Started.md new file mode 100644 index 0000000..cf9e55a --- /dev/null +++ b/Getting Started.md @@ -0,0 +1,51 @@ +# Getting started! + +## Playground + +The easiest way to get started with Metafacture is the Playground. Take a look at the [first example](https://metafacture.org/playground/?flux=PG_DATA%0A%7Cas-lines%0A%7Cdecode-formeta%0A%7Cfix%0A%7Cencode-xml%28rootTag%3D%22collection%22%29%0A%7Cprint%0A%3B&fix=move_field%28_id%2C+id%29%0Amove_field%28a%2C+title%29%0Apaste%28author%2C+b.v%2C+b.n%2C+%27~aus%27%2C+c%29%0Aretain%28id%2C+title%2C+author%29&data=1%7Ba%3A+Faust%2C+b+%7Bn%3A+Goethe%2C+v%3A+JW%7D%2C+c%3A+Weimar%7D%0A2%7Ba%3A+R%C3%A4uber%2C+b+%7Bn%3A+Schiller%2C+v%3A+F%7D%2C+c%3A+Weimar%7D&active-editor=fix) and run it by pressing the !["Process"](img/process.png) button. Check out the other examples (first button, !["Load Examples"](img/load-exmples.png)) for different input sources, transformations, and output formats. + +For commands available in the Flux, see [the Flux commands documentation](https://github.com/metafacture/metafacture-documentation/blob/master/flux-commands.md). + +For functions and usage of the Fix, see [the Fix functions and cookbook](https://github.com/metafacture/metafacture-fix#functions-and-cookbook). + +For next steps get familar with FLUX and FIX. And try out some metafacture workflows. + +## Command line + +To use Metafacture as a command-line tool, download the latest metafix-runner from our [releases page](https://github.com/metafacture/metafacture-fix/releases). Extract the downloaded archive and change into the newly created directory (e.g. `cd metafacture-runner-0.4.0`). Run a Flux workflow with: + +`$ ./bin/metafix-runner /path/to/your.flux` on Unix/Linux/Mac or +`$ ./bin/metafix-runner.bat /path/to/your.flux` on Windows. + +To get started, you can export a workflow from the Playground (last button, !["Export Workflow"](img/export.png)). + +To set up IDE support for editing your Flux and Fix files, see [the IDE extensions page](/ide-extensions/index.html). + +For next steps get familar with FLUX and FIX. And try out some metafacture workflows. + +## Using Metafacture as a Java library + +If you want to use Metafacture in your own Java projects all you need is to add some dependencies to your project. As of Metafacture 5, the single metafacture-core package has been replaced with a number of domain-specific packages. You can find the list of packages on [Maven Central](https://search.maven.org/search?q=g:org.metafacture). + +Alternatively, you can simply guess the package names from the top-level folders in the source code repository -- they are the same. For instance, if you want to use Metamorph in your project, simply add the following dependency to your `pom.xml`: + +```xml + + org.metafacture + metamorph + $VERSION + +``` + +or if Gradle is your build tool of choice use: + +```groovy +dependencies { + implementation 'org.metafacture:metamorph:$VERSION' +} +``` + +Occasionally, we publish snapshot builds on [Sonatype OSS Repository](https://oss.sonatype.org/index.html#nexus-search;gav~org.metafacture~~~~). The version number is derived from the branch name. Snapshot builds from the master branch always have the version `master-SNAPSHOT`. We also provide sometimes pre releases as github packages. + + +If you plan to use Metafacture as a Java library or if you wish to add commands to Flux. You should get familar with the [Framework](https://github.com/TobiasNx/metafacture_documentation_new/blob/main/Framework%20User%20Guid.md). diff --git a/Home.md b/Home.md new file mode 100644 index 0000000..70de300 --- /dev/null +++ b/Home.md @@ -0,0 +1,42 @@ +![logo](https://github.com/culturegraph/metafacture-core/wiki/img/metafacture_small.png) + + +Metafacture is a toolkit for processing semi-structured data with a focus on library metadata. It provides a versatile set of tools for reading, writing and transforming data. Metafacture can be used as a stand-alone application via CLI or as a Java library in other applications. There is also a playground where you can test workflows. + +The name Metafacture is a portmanteau of the words metadata and manufacture. + +Metafacture comprises three main parts: Framework, Flux and the Transformation-Module Fix. It can be extended with modules. + +__________________ + +## Using Metafacture via playground or CLI + +While working with the playground or the command line you only need [Flux](https://github.com/TobiasNx/metafacture_documentation_new/blob/main/Home.md#flux) and the transformation module [Fix](https://github.com/TobiasNx/metafacture_documentation_new/blob/main/Home.md#fix). No JAVA-Code is necessary!!! +Have a look here for [Getting started](https://github.com/TobiasNx/metafacture_documentation_new/blob/fcb2103acc3216dc39de5c6f05f2f481d4ec6126/Getting%20Started.md). + +## Framework for JAVA-Integration/Development + +If you plan to use Metafacture as a Java library or if you wish to add commands to Flux. You should get familar with the [Framework](https://github.com/TobiasNx/metafacture_documentation_new/blob/main/Home.md#framework). + +__________________ + +## FLUX + +Flux is a scripting language to easily build and run processing pipelines. No Java programming is necessary, just a command line. To use Flux you may download the binary distribution of Metafacture. + +For more information on how to use Flux, see the [Flux User Guide](https://github.com/TobiasNx/metafacture_documentation_new/blob/fcb2103acc3216dc39de5c6f05f2f481d4ec6126/Flux%20user%20guide.md). + +## FIX + +Metafix is a domain specific language for metadata transformation based on Catmandu FIX. The FIX object performing the transformation is used as part of a processing pipeline. If you are using the Flux scripting language to build and run pipelines, use the `fix` command. If you are using Metafacture as a Java library, just create a Metafix object and add it to your pipeline (see also the [Framework User Guide](https://github.com/TobiasNx/metafacture_documentation_new/blob/main/Home.md#framework)). + +The transformation itself is declared in a fix-object which can be a file. For more information on how to declare transformations see [Metafix User Guide](https://github.com/TobiasNx/metafacture_documentation_new/blob/fc9eb592bc42c81a141ded694fb81395e55d9675/Fix%20user%20guide.md). + +PS: There is also the transformation modul MORPH but for that have a look at the old documentation and the german cookbook by Swissbib (LINKS). + +## Framework + +The framework includes the interfaces and abstract classes which form the foundation of the data processing pipelines. This part of Metafacture is only relevant for you if you plan to use Metafacture as a Java library or if you wish to add pipe elements to Flux. + +For more information see the [Framework User Guide](https://github.com/TobiasNx/metafacture_documentation_new/blob/fcb2103acc3216dc39de5c6f05f2f481d4ec6126/Framework%20User%20Guid.md). + From a096b54186d6502cd8881426dc21ed5125490ff0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tobias=20B=C3=BClte?= Date: Mon, 21 Aug 2023 14:31:49 +0200 Subject: [PATCH 02/69] Update documentation --- Fix user guide.md => Fix-User-Guide.md | 58 +- Fix-function-and-Cookbook.md | 855 ++++++++++++++++++ Flux user guide.md => Flux-User-Guide.md | 58 +- ...rk User Guid.md => Framework-User-Guide.md | 15 +- Getting Started.md => Getting-Started.md | 23 +- Home.md | 30 +- 6 files changed, 982 insertions(+), 57 deletions(-) rename Fix user guide.md => Fix-User-Guide.md (90%) create mode 100644 Fix-function-and-Cookbook.md rename Flux user guide.md => Flux-User-Guide.md (84%) rename Framework User Guid.md => Framework-User-Guide.md (88%) rename Getting Started.md => Getting-Started.md (79%) diff --git a/Fix user guide.md b/Fix-User-Guide.md similarity index 90% rename from Fix user guide.md rename to Fix-User-Guide.md index 6215efa..2c05aff 100644 --- a/Fix user guide.md +++ b/Fix-User-Guide.md @@ -1,10 +1,13 @@ +![logo](https://github.com/culturegraph/metafacture-core/wiki/img/metafacture_small.png) + +# Fix User Guide + This document provides an introduction to the Metafacture Fix language (short: Metafix or Fix). Fix is a declarative flow oriented language in which transformations of arbitrary metadata/semi-structured data can be defined using the FIX language. The Fix language for Metafacture is introduced as an alternative to configuring data transformations with Metamorph. Inspired by Catmandu Fix, Metafix processes metadata not as a continuous data stream but as discrete records. The basic idea is to rebuild constructs from the (Catmandu) Fix language like functions, selectors and binds in Java and combine with additional functionalities from the Metamorph toolbox. ## Part of a metafacture worflow Metafacture Fix is a transformation module that can be used in a workflow, for this you have to use this in your pipeline: -- when using the FLUX: -- - address the `fix`-module -- - + +Flux-Example: ```PERL infile | open-file @@ -14,7 +17,10 @@ Metafacture Fix is a transformation module that can be used in a workflow, for t | encode-json | print ; - ``` +``` + +- when using the FLUX: +- - address the `fix`-module - - you can add variables - - there are some optiones available - - The fix-Transformation can be part of the FLUX `|fix("retain(`245??`)")` - usually useful for short fixes @@ -71,16 +77,17 @@ end **Selectors** can be used as hghlevel filter to filter the records you want. **Binds** are wrappers for one or more fixes. They give extra control functionality for fixes such as loops. -All binds have the syntax: +All binds have the same syntax: + ```PERL do Bind(params,…) fix(..) fix(..) end -`````` +``` + +Find here a [list of all function, selectors, binds and conditionals](/Fix-function-and-Cookbook.md). -For a list of all function, selectors, binds and conditionals have a look at: -https://github.com/metafacture/metafacture-core/wiki#functions-and-cookbook ## Addressing Pieces of Data - of: FIX-Path and the record structure in FIX @@ -88,11 +95,11 @@ Internally FIX knows arrays, objects/hashes and simple elements. How a format is Since function manipulate, add or remove elements in a record, it is essential to understand the way on can adress source or target elements. -[e.g.: +e.g.: ```PERL copy_field("", "") ``` -] + To adress the source or target element here, you need to provide the path to the element. Metafacture Fix is using a path-syntax that is JSON Path like but not identical. It also uses the dot notation but there are some differences with the path structure of arrays and repeated fields. Especially when working with JSON, YAML or records repeated fields. @@ -145,7 +152,7 @@ z : To adress paths you can use wildcards. For instance the star-wildcard: `person*` would match all simple literals with element names starting with 'person': 'person\_name', 'person\_age', etc. Apart from the star-wildcard, the questionmark-wildcard ('?') is supported. It matches exactly one arbitrary character. -Not yet supported is alteration. +Not fully supported yet is alteration of pathes. Besides path wildcards there are array/list wildcards that are used to refrence specific elements or all elements in an array. `g[].*` adresses all strings in the array `g[]`. `g[].$append` would refrence a new element in the array at the end of the array. `g[].$last` refrences the last element in an array. @@ -191,6 +198,35 @@ In this case `target_field` and `source_field` serve as a parameter (the name is Parameters are scoped, which means that the ones provided with the `call_macro` function shadow global ones. Macros cannot be nested. +## Parameters to Metamorph Definitions / Using variables + +Fix definitions may contain parameters. They follow the pattern `$[NAME]`: + +```perl +add_field("rights","$[rights]") +``` + +`$[rights]` in this case is a compile-time variable which is evaluated on +creation of the respective Fix object. + +The `` section in the Metamorph definition can be used to set defaults: + +```xml + + + +``` + +For Java implementations: Compile-time variable are passed to Fix as a constructor parameter. + +```java +final Map vars = new HashMap(); +vars.put("rights", "CC-0"); + +final Metafix metafix = new metafix("fixdef.fix", vars); +``` + + ## Splitting Fixes for Reuse diff --git a/Fix-function-and-Cookbook.md b/Fix-function-and-Cookbook.md new file mode 100644 index 0000000..bab017a --- /dev/null +++ b/Fix-function-and-Cookbook.md @@ -0,0 +1,855 @@ +This page is a replication of the passage of the Fix Readme.md. + +## Functions and cookbook + +### Best practices and guidelines for working with Metafacture Fix + +- We recommend to use double quotation marks for arguments and values in functions, binds and conditionals. +- If using a `list` bind with a variable, the `var` option requires quotation marks (`do list(path: "", "var": "")`). +- Fix turns repeated fields into arrays internally but only marked arrays (with `[]` at the end of the field name) are also emitted as "arrays" (entities with indexed literals), all other arrays are emitted as repeated fields. +- Every Fix file should end with a final newline. + +### Glossary + +#### Array wildcards + +Array wildcards resemble [Catmandu's concept of wildcards](http://librecat.org/Catmandu/#wildcards). + +When working with arrays and repeated fields you can use wildcards instead of an index number to select elements of an array. + +| Wildcard | Meaning | +|----------|:--------| +| `*` | Selects _all_ elements of an array. | +| `$first` | Selects only the _first_ element of an array. | +| `$last` | Selects only the _last_ element of an array. | +| `$prepend` | Selects the position _before_ the first element of an array. Can only be used when adding new elements to an array. | +| `$append` | Selects the position _after_ the last element of an array. Can only be used when adding new elements to an array. | + +#### Path wildcards + +Path wildcards resemble [Metamorph's concept of wildcards](https://github.com/metafacture/metafacture-core/wiki/Metamorph-User-Guide#addressing-pieces-of-data). They are not supported in Catmandu (it has [specialized Fix functions](https://librecat.org/Catmandu/#marc-mab-pica-paths) instead). + +You can use path wildcards to select fields matching a pattern. They only match path _segments_ (field names), though, not _whole_ paths of nested fields. These wildcards cannot be used to add new elements. + +| Wildcard | Meaning | +|----------|:--------| +| `*` | Placeholder for zero or more characters. | +| `?` | Placeholder for exactly one character. | +| `\|` | Alternation of multiple patterns. | +| `[...]` | Enumeration of characters. | + +### Functions + +#### Script-level functions + +##### `include` + +Includes a Fix file and executes it as if its statements were written in place of the function call. + +Parameters: + +- `path` (required): Path to Fix file (if the path starts with a `.`, it is resolved relative to the including file's directory; otherwise, it is resolved relative to the current working directory). + +Options: + +- All options are made available as "dynamic" local variables in the included Fix file. + +```perl +include(""[, ...]) +``` + +##### `nothing` + +Does nothing. It is used for benchmarking in Catmandu. + +```perl +nothing() +``` + +##### `put_filemap` + +Defines an external map for [lookup](#lookup) from a file or a URL. Maps with more than 2 columns are supported but are reduced to a defined key and a value column. + +```perl +put_filemap("", "", sep_char: "\t") +``` + +The separator (`sep_char`) will vary depending on the source file, e.g.: + +| Type | Separator | +|------|------------| +| CSV | `,` or `;` | +| TSV | `\t` | + +Options: + +- `allow_empty_values`: Sets whether to allow empty values in the filemap or to ignore these entries. (Default: `false`) +- `compression`: Sets the compression of the file. +- `decompress_concatenated`: Flags whether to use decompress concatenated file compression. +- `encoding`: Sets the encoding used to open the resource. +- `expected_columns`: Sets number of expected columns; lines with different number of columns are ignored. Set to `-1` to disable the check and allow arbitrary number of columns. (Default: `2`) +- `key_column`: Defines the column to be used for keys. Uses zero index. (Default: `0`) +- `value_column`: Defines the column to be used for values. Uses zero index. (Default: `1`) + +##### `put_map` + +Defines an internal map for [lookup](#lookup) from key/value pairs. + +```perl +put_map("", + "dog": "mammal", + "parrot": "bird", + "shark": "fish" +) +``` + +##### `put_rdfmap` + +Defines an external RDF map for lookup from a file or an HTTP(S) resource. +As the RDF map is reducing RDF triples to a key/value map it is mandatory to set the target. +The targeted RDF property can optionally be bound by an RDF language tag. + +```perl +put_rdfmap("", "", target: "") +put_rdfmap("", "", target: "", select_language: "") +``` + +##### `put_var` + +Defines a single global variable that can be referenced with `$[]`. + +```perl +put_var("", "") +``` + +##### `put_vars` + +Defines multiple global variables that can be referenced with `$[]`. + +```perl +put_vars( + "": "", + "": "" +) +``` + +#### Record-level functions + +##### `add_field` + +Creates (or appends to) a field with a defined value. + +```perl +add_field("", "") +``` + +##### `array` + +Converts a hash/object into an array. + +```perl +array("") +``` + +E.g.: + +```perl +array("foo") +# {"name":"value"} => ["name", "value"] +``` + +##### `call_macro` + +Calls a named macro, i.e. a list of statements that have been previously defined with the [`do put_macro`](#do-put_macro) bind. + +Parameters: + +- `name` (required): Unique name of the macro. + +Options: + +- All options are made available as "dynamic" local variables in the macro. + +```perl +do put_macro(""[, ...]) + ... +end +call_macro(""[, ...]) +``` + +##### `copy_field` + +Copies (or appends to) a field from an existing field. + +```perl +copy_field("", "") +``` + +##### `format` + +Replaces the value with a formatted (`sprintf`-like) version. + +---- TODO: THIS NEEDS MORE CONTENT ----- + +```perl +format("", "") +``` + +##### `hash` + +Converts an array into a hash/object. + +```perl +hash("") +``` + +E.g.: +```perl +hash("foo") +# ["name", "value"] => {"name":"value"} +``` + +##### `move_field` + +Moves (or appends to) a field from an existing field. Can be used to rename a field. + +```perl +move_field("", "") +``` + +##### `parse_text` + +Parses a text into an array or hash of values. + +---- TODO: THIS NEEDS MORE CONTENT ----- + +```perl +parse_text("", "") +``` + +##### `paste` + +Joins multiple field values into a new field. Can be combined with additional literal strings. + +The default `join_char` is a single space. Literal strings have to start with `~`. + +```perl +paste("", ""[, ...][, "join_char": ", "]) +``` + +E.g.: + +```perl +# a: eeny +# b: meeny +# c: miny +# d: moe +paste("my.string", "~Hi", "a", "~how are you?") +# "my.string": "Hi eeny how are you?" +``` + +##### `print_record` + +Prints the current record as JSON either to standard output or to a file. + +Parameters: + +- `prefix` (optional): Prefix to print before the record; may include [format directives](https://docs.oracle.com/javase/8/docs/api/java/util/Formatter.html#syntax) for counter and record ID (in that order). (Default: Empty string) + +Options: + +- `append`: Whether to open files in append mode if they exist. (Default: `false`) +- `compression` (file output only): Compression mode. (Default: `auto`) +- `destination`: Destination to write the record to; may include [format directives](https://docs.oracle.com/javase/8/docs/api/java/util/Formatter.html#syntax) for counter and record ID (in that order). (Default: `stdout`) +- `encoding` (file output only): Encoding used by the underlying writer. (Default: `UTF-8`) +- `footer`: Footer which is written at the end of the output. (Default: `\n`) +- `header`: Header which is written at the beginning of the output. (Default: Empty string) +- `id`: Field name which contains the record ID; if found, will be available for inclusion in `prefix` and `destination`. (Default: `_id`) +- `internal`: Whether to print the record's internal representation instead of JSON. (Default: `false`) +- `pretty`: Whether to use pretty printing. (Default: `false`) +- `separator`: Separator which is written after the record. (Default: `\n`) + +```perl +print_record([""][, ...]) +``` + +E.g.: + +```perl +print_record("%d) Before transformation: ") +print_record(destination: "record-%2$s.json", id: "001", pretty: "true") +print_record(destination: "record-%03d.json.gz", header: "After transformation: ") +``` + +##### `random` + +Creates (or replaces) a field with a random number (less than the specified maximum). + +```perl +random("", "") +``` + +##### `remove_field` + +Removes a field. + +```perl +remove_field("") +``` + +##### `rename` + +Replaces a regular expression pattern in subfield names of a field. Does not change the name of the source field itself. + +```perl +rename("", "", "") +``` + +##### `retain` + +Deletes all fields except the ones listed (incl. subfields). + +```perl +retain(""[, ...]) +``` + +##### `set_array` + +Creates a new array (with optional values). + +```perl +set_array("") +set_array("", ""[, ...]) +``` + +##### `set_field` + +Creates (or replaces) a field with a defined value. + +```perl +set_field("", "") +``` + +##### `set_hash` + +Creates a new hash (with optional values). + +```perl +set_hash("") +set_hash("", "subfieldName": ""[, ...]) +``` + +##### `timestamp` + +Creates (or replaces) a field with the current timestamp. + +Options: + +- `format`: Date and time pattern as in [java.text.SimpleDateFormat](https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html). (Default: `timestamp`) +- `timezone`: Time zone as in [java.util.TimeZone](https://docs.oracle.com/javase/8/docs/api/java/util/TimeZone.html). (Default: `UTC`) +- `language`: Language tag as in [java.util.Locale](https://docs.oracle.com/javase/8/docs/api/java/util/Locale.html). (Default: The locale of the host system) + +```perl +timestamp(""[, format: ""][, timezone: ""][, language: ""]) +``` + +##### `vacuum` + +Deletes empty fields, arrays and objects. + +```perl +vacuum() +``` + +#### Field-level functions + +##### `append` + +Adds a string at the end of a field value. + +```perl +append("", "") +``` + +##### `capitalize` + +Upcases the first character in a field value. + +```perl +capitalize("") +``` + +##### `count` + +Counts the number of elements in an array or a hash and replaces the field value with this number. + +```perl +count("") +``` + +##### `downcase` + +Downcases all characters in a field value. + +```perl +downcase("") +``` + +##### `filter` + +Only keeps field values that match the regular expression pattern. Works only with array of strings/repeated fields. + +```perl +filter("", "") +``` + +##### `flatten` + +Flattens a nested array field. + +```perl +flatten("") +``` + +##### `from_json` + +Replaces the string with its JSON deserialization. + +Options: + +- `error_string`: Error message as a placeholder if the JSON couldn't be parsed. (Default: `null`) + +```perl +from_json(""[, error_string: ""]) +``` + +##### `index` + +Returns the index position of a substring in a field and replaces the field value with this number. + +```perl +index("", "") +``` + +##### `isbn` + +Extracts an ISBN and replaces the field value with the normalized ISBN; optionally converts and/or validates the ISBN. + +Options: + +- `to`: ISBN format to convert to (either `ISBN10` or `ISBN13`). (Default: Only normalize ISBN) +- `verify_check_digit`: Whether the check digit should be verified. (Default: `false`) +- `error_string`: Error message as a placeholder if the ISBN couldn't be validated. (Default: `null`) + +```perl +isbn(""[, to: ""][, verify_check_digit: ""][, error_string: ""]) +``` + +##### `join_field` + +Joins an array of strings into a single string. + +```perl +join_field("", "") +``` + +##### `lookup` + +Looks up matching values in a map and replaces the field value with this match. [External files](#put_filemap), [internal maps](#put_map) as well as [RDF resources](#put_rdfmap) can be used. + +Parameters: + +- `path` (required): Field path to look up. +- `map` (optional): Name or path of the map in which to look up values. + +Options: + +- `__default`: Default value to use for unknown values. (Default: Old value) +- `delete`: Whether to delete unknown values. (Default: `false`) +- `print_unknown`: Whether to print unknown values. (Default: `false`) + +Additional options when printing unknown values: + +- `append`: Whether to open files in append mode if they exist. (Default: `true`) +- `compression` (file output only): Compression mode. (Default: `auto`) +- `destination`: Destination to write unknown values to; may include [format directives](https://docs.oracle.com/javase/8/docs/api/java/util/Formatter.html#syntax) for counter and record ID (in that order). (Default: `stdout`) +- `encoding` (file output only): Encoding used by the underlying writer. (Default: `UTF-8`) +- `footer`: Footer which is written at the end of the output. (Default: `\n`) +- `header`: Header which is written at the beginning of the output. (Default: Empty string) +- `id`: Field name which contains the record ID; if found, will be available for inclusion in `prefix` and `destination`. (Default: `_id`) +- `prefix`: Prefix to print before the unknown value; may include [format directives](https://docs.oracle.com/javase/8/docs/api/java/util/Formatter.html#syntax) for counter and record ID (in that order). (Default: Empty string) +- `separator`: Separator which is written after the unknown value. (Default: `\n`) + +```perl +lookup(""[, ][, ...]) +``` + +E.g.: + +```perl +# local (unnamed) map +lookup("path.to.field", key_1: "value_1", ...) + +# internal (named) map +put_map("internal-map", key_1: "value_1", ...) +lookup("path.to.field", "internal-map") + +# external file map (implicit) +lookup("path.to.field", "path/to/file", sep_char: ";") + +# external file map (explicit) +put_filemap("path/to/file", "file-map", sep_char: ";") +lookup("path.to.field", "file-map") + +# RDF map (explicit) +put_rdfmap("path/to/file", "rdf-map", target: "") +lookup("path.to.field", "rdf-map") + +# with default value +lookup("path.to.field", "map-name", __default: "NA") + +# with printing unknown values to a file +lookup("path.to.field", "map-name", print_unknown: "true", destination: "unknown.txt") +``` + +##### `prepend` + +Adds a string at the beginning of a field value. + +```perl +prepend("", "") +``` + +##### `replace_all` + +Replaces a regular expression pattern in field values with a replacement string. Regexp capturing is possible; refer to capturing groups by number (`$`) or name (`${}`). + +```perl +replace_all("", "", "") +``` + +##### `reverse` + +Reverses the character order of a string or the element order of an array. + +```perl +reverse("") +``` + +##### `sort_field` + +Sorts strings in an array. Alphabetically and A-Z by default. Optional numerical and reverse sorting. + +```perl +sort_field("") +sort_field("", reverse: "true") +sort_field("", numeric: "true") +``` + +##### `split_field` + +Splits a string into an array and replaces the field value with this array. + +```perl +split_field("", "") +``` + +##### `substring` + +Replaces a string with its substring as defined by the start position (offset) and length. + +```perl +substring("", "", "") +``` + +##### `sum` + +Sums numbers in an array and replaces the field value with this number. + +```perl +sum("") +``` + +##### `to_json` + +Replaces the value with its JSON serialization. + +Options: + +- `error_string`: Error message as a placeholder if the JSON couldn't be generated. (Default: `null`) +- `pretty`: Whether to use pretty printing. (Default: `false`) + +```perl +to_json(""[, pretty: ""][, error_string: ""]) +``` + +##### `trim` + +Deletes whitespace at the beginning and the end of a field value. + +```perl +trim("") +``` + +##### `uniq` + +Deletes duplicate values in an array. + +```perl +uniq("") +``` + +##### `upcase` + +Upcases all characters in a field value. + +```perl +upcase("") +``` + +##### `uri_encode` + +Encodes a field value as URI. Aka percent-encoding. + +```perl +uri_encode("") +``` + +### Selectors + +#### `reject` + +Ignores records that match a condition. + +```perl +if + reject() +end +``` + +### Binds + +#### `do list` + +Iterates over each element of an array. In contrast to Catmandu, it can also iterate over a single object or string. + +```perl +do list(path: "") + ... +end +``` + +Only the current element is accessible in this case (as the root element). + +When specifying a variable name for the current element, the record remains accessible as the root element and the current element is accessible through the variable name: + +```perl +do list(path: "", "var": "") + ... +end +``` + +#### `do list_as` + +Iterates over each _named_ element of an array (like [`do list`](#do-list) with a variable name). If multiple arrays are given, iterates over the _corresponding_ elements from each array (i.e., all elements with the same array index, skipping elements whose arrays have already been exhausted). + +```perl +do list_as(element_1: ""[, ...]) + ... +end +``` + +E.g.: + +```perl +# "ccm:university":["https://ror.org/0304hq317"] +# "ccm:university_DISPLAYNAME":["Gottfried Wilhelm Leibniz Universität Hannover"] +set_array("sourceOrga[]") +do list_as(orgId: "ccm:university[]", orgName: "ccm:university_DISPLAYNAME[]") + copy_field(orgId, "sourceOrga[].$append.id") + copy_field(orgName, "sourceOrga[].$last.name") +end +# {"sourceOrga":[{"id":"https://ror.org/0304hq317","name":"Gottfried Wilhelm Leibniz Universität Hannover"}]} +``` + +#### `do once` + +Executes the statements only once (when the bind is first encountered), not repeatedly for each record. + +```perl +do once() + ... +end +``` + +In order to execute multiple blocks only once, tag them with unique identifiers: + +```perl +do once("maps setup") + ... +end +do once("vars setup") + ... +end +``` + +#### `do put_macro` + +Defines a named macro, i.e. a list of statements that can be executed later with the [`call_macro`](#call_macro) function. + +Variables can be referenced with `$[]`, in the following order of precedence: + +1. "dynamic" local variables, passed as options to the `call_macro` function; +2. "static" local variables, passed as options to the `do put_macro` bind; +3. global variables, defined via [`put_var`](#put_var)/[`put_vars`](#put_vars). + +Parameters: + +- `name` (required): Unique name of the macro. + +Options: + +- All options are made available as "static" local variables in the macro. + +```perl +do put_macro(""[, ...]) + ... +end +call_macro(""[, ...]) +``` + +### Conditionals + +Conditionals start with `if` in case of affirming the condition or `unless` rejecting the condition. + +Conditionals require a final `end`. + +Additional conditionals can be set with `elsif` and `else`. + +```perl +if + ... +end +``` + +```perl +unless + ... +end +``` + +```perl +if + ... +elsif + ... +else + ... +end +``` + +#### `contain` + +##### `all_contain` + +Executes the functions if/unless the field contains the value. If it is an array or a hash all field values must contain the string. + +##### `any_contain` + +Executes the functions if/unless the field contains the value. If it is an array or a hash one or more field values must contain the string. + +##### `none_contain` + +Executes the functions if/unless the field does not contain the value. If it is an array or a hash none of the field values may contain the string. + +##### `str_contain` + +Executes the functions if/unless the first string contains the second string. + +#### `equal` + +##### `all_equal` + +Executes the functions if/unless the field value equals the string. If it is an array or a hash all field values must equal the string. + +##### `any_equal` + +Executes the functions if/unless the field value equals the string. If it is an array or a hash one or more field values must equal the string. + +##### `none_equal` + +Executes the functions if/unless the field value does not equal the string. If it is an array or a hash none of the field values may equal the string. + +##### `str_equal` + +Executes the functions if/unless the first string equals the second string. + +#### `exists` + +Executes the functions if/unless the field exists. + +```perl +if exists("") +``` + +#### `in` + +Executes the functions if/unless the field value [is contained in](https://perldoc.perl.org/perlop#Smartmatch-Operator) the value of the other field. + +_Also aliased as [`is_contained_in`](#is_contained_in)._ + +#### `is_contained_in` + +_Alias for [`in`](#in)._ + +#### `is_array` + +Executes the functions if/unless the field value is an array. + +#### `is_empty` + +Executes the functions if/unless the field value is empty. + +#### `is_false` + +Executes the functions if/unless the field value equals `false` or `0`. + +#### `is_hash` + +_Alias for [`is_object`](#is_object)._ + +#### `is_number` + +Executes the functions if/unless the field value is a number. + +#### `is_object` + +Executes the functions if/unless the field value is a hash (object). + +_Also aliased as [`is_hash`](#is_hash)._ + +#### `is_string` + +Executes the functions if/unless the field value is a string (and not a number). + +#### `is_true` + +Executes the functions if/unless the field value equals `true` or `1`. + +#### `match` + +##### `all_match` + +Executes the functions if/unless the field value matches the regular expression pattern. If it is an array or a hash all field values must match the regular expression pattern. + +##### `any_match` + +Executes the functions if/unless the field value matches the regular expression pattern. If it is an array or a hash one or more field values must match the regular expression pattern. + +##### `none_match` + +Executes the functions if/unless the field value does not match the regular expression pattern. If it is an array or a hash none of the field values may match the regular expression pattern. + +##### `str_match` + +Executes the functions if/unless the string matches the regular expression pattern. \ No newline at end of file diff --git a/Flux user guide.md b/Flux-User-Guide.md similarity index 84% rename from Flux user guide.md rename to Flux-User-Guide.md index 6e12d61..5de39d5 100644 --- a/Flux user guide.md +++ b/Flux-User-Guide.md @@ -1,20 +1,23 @@ +![logo](https://github.com/culturegraph/metafacture-core/wiki/img/metafacture_small.png) + +# Flux User Guide + This document provides a quick introduction to Metafacture Flux, a domain specific language to build data flows for metadata processing. The Flux makes use of Metafacture as a stand-alone application - so you build workflows without the need of writing java code. +## Installing Flux and Running Flux -# Installing Flux and Running Flux - -# There are different ways of installing and running Metafacture FLUX -## Stand-alone application (without Java Code) +## There are different ways of installing and running Metafacture FLUX +### Stand-alone application (without Java Code) -Either use a prebuild distribution by unziping the Metafacture distribution archive. With regard of using FIX we advise to use the Runner provided by the Metafacture Fix repo. See [releases page](https://github.com/metafacture/metafacture-fix/releases) +Either use a prebuild distribution by unziping the Metafacture distribution archive. With regard of using FIX we advise to use the Runner provided by the Metafacture Fix repo. See [releases page](https://github.com/metafacture/metafacture-fix/releases) Then execute the script `flux.sh` or `flux.bat` in the unzipped `bin/` folder. -## More elaborate ways for developers: +### More elaborate ways for developers: -### Build from local distribution +#### Build from local distribution Check out the repo to build a certain branch and roll your own local distribution like this: ```bash $ cd metafacture-core; ./gradlew installDist @@ -22,11 +25,11 @@ $ cd metafacture-core; ./gradlew installDist Then go to `metafacture-core/metafacture-runner/build/install/metafacture-core` and execute the `flux.*` there. -### Working with the Java source code +#### Working with the Java source code If you are working with the source code directly, execute the class `org.metafacture.runner.Flux`. -## Run a Flux-File +### Run a Flux-File Just provide the flux file you wish to run as first argument. @@ -34,14 +37,14 @@ Just provide the flux file you wish to run as first argument. $> flux.sh FILE.flux ``` -## Provide Arguments +### Provide Arguments To provide arguments add variable assignments after the first argument as follows: ```bash $> flux.sh FILE.flux var1=value1 var2=value2 ``` This sets the variable `var1` to the value 'value1' and `var2` to the value 'value2'. -# Writing Flux files +## Writing Flux files The following snippet shows a simple flux file: ```c //declare variables @@ -57,23 +60,13 @@ file | write("stdout") ; ``` -In the first section variables are declared, in the second, we define the flow. -Linebreaks are optional. Semicolons `;` mark the end of a variable assignment or flow definition. +In the first section [variables](#variables) are declared, in the second, we [define the flow](#flow-definitions). +A flow is a combination of different [FLUX commands. Here is a list to all available Flux-Commands.](https://github.com/metafacture/metafacture-documentation/blob/master/flux-commands.md) -[List of available FLUX commands.](https://github.com/metafacture/metafacture-documentation/blob/master/flux-commands.md) +Linebreaks are optional, but help concerning readability. One can add comments with `//`. +Semicolons `;` mark the end of a variable assignment or flow definition. -## Variables -Variables are always Strings and can be concatenated with the `+` operator. Escape sequences follow the Java String conventions: `\n`=line break, `\t`=tab, `\\`=\, `\u0024`=unicode character, etc. - -The `default` keyword tells Flux to assign the respective value _only_ if the variable has -not yet been set on the command line. Without `default`, previous assignments will be overwritten. - -Paths are always relative to the directory within which the flux command is executed. To address files relative to the location of the executed flux file, use the predefined `FLUX_DIR` variable. - -## Comments -Flux supports single line C/Java-style comments: `//comment`. - -## Flow Definitions +### Flow Definitions A FLUX contains multiple command-moduls that are doing specific things. E.g.: @@ -102,6 +95,16 @@ Note that unlike shell pipes, the data flowing between Flux commands is _typed_. To lookup the signatures, execute Flux without arguments or see: [[Metafix-User-Guide#parameters-to-metafix-definitions]]). It will list all available commands, including signatures. +### Variables +Variables are always Strings and can be concatenated with the `+` operator. Escape sequences follow the Java String conventions: `\n`=line break, `\t`=tab, `\\`=\, `\u0024`=unicode character, etc. + +The `default` keyword tells Flux to assign the respective value _only_ if the variable has +not yet been set on the command line. Without `default`, previous assignments will be overwritten. + +Paths are always relative to the directory within which the flux command is executed. To address files relative to the location of the executed flux file, use the predefined `FLUX_DIR` variable. + +### Comments +Flux supports single line C/Java-style comments: `//comment`. ## Getting Help and Inspiration (TODO: Ersetzen.) @@ -111,6 +114,9 @@ To lookup the signatures, execute Flux without arguments or see: [[Metafix-User- _________________________ # For developers: +> [!NOTE] +> Coding in JAVA. + ## Adding new Commands Add your class and a descriptive flux shortcut to `flux-commands.properties`. This file acts as a lookup table for flux commands. Use the proper file, i.e. the one residing in the same module where your newly created class resides. If you have e.g. created a class in the module `metafacture-biblio`, you add the flux-command to https://github.com/metafacture/metafacture-core/blob/master/metafacture-biblio/src/main/resources/flux-commands.properties. Recompile. That's all to add a command. diff --git a/Framework User Guid.md b/Framework-User-Guide.md similarity index 88% rename from Framework User Guid.md rename to Framework-User-Guide.md index b9e16b5..84e891d 100644 --- a/Framework User Guid.md +++ b/Framework-User-Guide.md @@ -1,7 +1,14 @@ +![logo](https://github.com/culturegraph/metafacture-core/wiki/img/metafacture_small.png) + +# Framework User Guide + +> [!NOTE] +>Relevant for JAVA developers. For using metafacture without JAVA Code see the [FLUX user guide](/Flux-User-Guide.md). + This page explains how to create a Metafacture objects and how to assemble them to form a processing pipeline. We use as an example a simple pipeline containing a Metamorph instance. -# Building a Flow +## Building a Flow A Flow consists of a data source, an arbitrary number of pipe elements and finally a data sink. The individual elements are connected by calling the `setReceiver()` method. The following code snipped shows an example. @@ -42,11 +49,11 @@ reader.setReceiver(splitter).setReceiver("Tn", writer1); splitter.setReceiver("Tp", writer2); ``` -# Piping different Objects +## Piping different Objects -# Objects as Eventstream +### Objects as Eventstream ```java public interface StreamSender { @@ -64,7 +71,7 @@ public interface StreamReceiver { } ``` -# Error Handling +### Error Handling If an exception occurs during the processing of a stream of records, it is back propagated to the first element in the chain. This normally means that processing is terminated which may not be the preferred action. Imagine diff --git a/Getting Started.md b/Getting-Started.md similarity index 79% rename from Getting Started.md rename to Getting-Started.md index cf9e55a..fc37e36 100644 --- a/Getting Started.md +++ b/Getting-Started.md @@ -1,33 +1,38 @@ +![logo](https://github.com/culturegraph/metafacture-core/wiki/img/metafacture_small.png) + + # Getting started! ## Playground The easiest way to get started with Metafacture is the Playground. Take a look at the [first example](https://metafacture.org/playground/?flux=PG_DATA%0A%7Cas-lines%0A%7Cdecode-formeta%0A%7Cfix%0A%7Cencode-xml%28rootTag%3D%22collection%22%29%0A%7Cprint%0A%3B&fix=move_field%28_id%2C+id%29%0Amove_field%28a%2C+title%29%0Apaste%28author%2C+b.v%2C+b.n%2C+%27~aus%27%2C+c%29%0Aretain%28id%2C+title%2C+author%29&data=1%7Ba%3A+Faust%2C+b+%7Bn%3A+Goethe%2C+v%3A+JW%7D%2C+c%3A+Weimar%7D%0A2%7Ba%3A+R%C3%A4uber%2C+b+%7Bn%3A+Schiller%2C+v%3A+F%7D%2C+c%3A+Weimar%7D&active-editor=fix) and run it by pressing the !["Process"](img/process.png) button. Check out the other examples (first button, !["Load Examples"](img/load-exmples.png)) for different input sources, transformations, and output formats. -For commands available in the Flux, see [the Flux commands documentation](https://github.com/metafacture/metafacture-documentation/blob/master/flux-commands.md). +For commands available in the Flux, see [the Flux commands documentation](/flux-commands.md). -For functions and usage of the Fix, see [the Fix functions and cookbook](https://github.com/metafacture/metafacture-fix#functions-and-cookbook). +For functions and usage of the Fix, see [the Fix functions and cookbook](/Fix-functions-and-cookbook). -For next steps get familar with FLUX and FIX. And try out some metafacture workflows. +For next steps get familar with [FLUX](/Flux-User-Guide.md) and [FIX](/Fix-User-Guide.md). And try out some metafacture workflows. ## Command line -To use Metafacture as a command-line tool, download the latest metafix-runner from our [releases page](https://github.com/metafacture/metafacture-fix/releases). Extract the downloaded archive and change into the newly created directory (e.g. `cd metafacture-runner-0.4.0`). Run a Flux workflow with: +To use Metafacture as a command-line tool, download the latest metafix-runner from our [releases page](https://github.com/metafacture/metafacture-fix/releases). Extract the downloaded archive and change into the newly created directory (e.g. `cd metafacture-runner-0.5.1`). Run a Flux workflow with: `$ ./bin/metafix-runner /path/to/your.flux` on Unix/Linux/Mac or `$ ./bin/metafix-runner.bat /path/to/your.flux` on Windows. To get started, you can export a workflow from the Playground (last button, !["Export Workflow"](img/export.png)). -To set up IDE support for editing your Flux and Fix files, see [the IDE extensions page](/ide-extensions/index.html). +To set up IDE support for editing your Flux and Fix files, see [the IDE extensions page](https://metafacture.org/ide-extensions/index.html). -For next steps get familar with FLUX and FIX. And try out some metafacture workflows. +For next steps get familar with FLUX (hyper link) and FIX(hyper link). And try out some metafacture workflows. ## Using Metafacture as a Java library If you want to use Metafacture in your own Java projects all you need is to add some dependencies to your project. As of Metafacture 5, the single metafacture-core package has been replaced with a number of domain-specific packages. You can find the list of packages on [Maven Central](https://search.maven.org/search?q=g:org.metafacture). -Alternatively, you can simply guess the package names from the top-level folders in the source code repository -- they are the same. For instance, if you want to use Metamorph in your project, simply add the following dependency to your `pom.xml`: +Alternatively, you can simply guess the package names from the top-level folders in the source code repository -- they are the same. + +TODO: For instance, if you want to use Metamorph in your project, simply add the following dependency to your `pom.xml`: ```xml @@ -45,7 +50,9 @@ dependencies { } ``` +To use Fix you need to + Occasionally, we publish snapshot builds on [Sonatype OSS Repository](https://oss.sonatype.org/index.html#nexus-search;gav~org.metafacture~~~~). The version number is derived from the branch name. Snapshot builds from the master branch always have the version `master-SNAPSHOT`. We also provide sometimes pre releases as github packages. -If you plan to use Metafacture as a Java library or if you wish to add commands to Flux. You should get familar with the [Framework](https://github.com/TobiasNx/metafacture_documentation_new/blob/main/Framework%20User%20Guid.md). +If you plan to use Metafacture as a Java library or if you wish to add commands to Flux. You should get familar with the [Framework](/Framework-User-Guide.md). diff --git a/Home.md b/Home.md index 70de300..ef76d07 100644 --- a/Home.md +++ b/Home.md @@ -1,5 +1,6 @@ ![logo](https://github.com/culturegraph/metafacture-core/wiki/img/metafacture_small.png) +# Metafacture Documentation Metafacture is a toolkit for processing semi-structured data with a focus on library metadata. It provides a versatile set of tools for reading, writing and transforming data. Metafacture can be used as a stand-alone application via CLI or as a Java library in other applications. There is also a playground where you can test workflows. @@ -11,12 +12,15 @@ __________________ ## Using Metafacture via playground or CLI -While working with the playground or the command line you only need [Flux](https://github.com/TobiasNx/metafacture_documentation_new/blob/main/Home.md#flux) and the transformation module [Fix](https://github.com/TobiasNx/metafacture_documentation_new/blob/main/Home.md#fix). No JAVA-Code is necessary!!! -Have a look here for [Getting started](https://github.com/TobiasNx/metafacture_documentation_new/blob/fcb2103acc3216dc39de5c6f05f2f481d4ec6126/Getting%20Started.md). +> [!NOTE] +> No JAVA-Code is necessary!!! + +While working with the playground or the command line you only need [Flux](#flux) and the transformation module [Fix](#fix). +Have a look here for [Getting started](/Getting-Started.md). ## Framework for JAVA-Integration/Development -If you plan to use Metafacture as a Java library or if you wish to add commands to Flux. You should get familar with the [Framework](https://github.com/TobiasNx/metafacture_documentation_new/blob/main/Home.md#framework). +If you plan to use Metafacture as a Java library or if you wish to add commands to Flux. You should get familar with the [Framework](/Home.md#framework). __________________ @@ -24,19 +28,29 @@ __________________ Flux is a scripting language to easily build and run processing pipelines. No Java programming is necessary, just a command line. To use Flux you may download the binary distribution of Metafacture. -For more information on how to use Flux, see the [Flux User Guide](https://github.com/TobiasNx/metafacture_documentation_new/blob/fcb2103acc3216dc39de5c6f05f2f481d4ec6126/Flux%20user%20guide.md). +For more information on how to use Flux, see the [Flux User Guide](/Flux-User-Guide.md). + +See [here for all available FLUX-Commands](/flux-commands.md). ## FIX -Metafix is a domain specific language for metadata transformation based on Catmandu FIX. The FIX object performing the transformation is used as part of a processing pipeline. If you are using the Flux scripting language to build and run pipelines, use the `fix` command. If you are using Metafacture as a Java library, just create a Metafix object and add it to your pipeline (see also the [Framework User Guide](https://github.com/TobiasNx/metafacture_documentation_new/blob/main/Home.md#framework)). +Metafix is a domain specific language for metadata transformation based on Catmandu FIX. The FIX object performing the transformation is used as part of a processing pipeline. + +If you are using **Metafacture with CLI or Playground** and therefore the Flux scripting language to build and run pipelines, use the `fix` command in your FLUX-Pipeline. -The transformation itself is declared in a fix-object which can be a file. For more information on how to declare transformations see [Metafix User Guide](https://github.com/TobiasNx/metafacture_documentation_new/blob/fc9eb592bc42c81a141ded694fb81395e55d9675/Fix%20user%20guide.md). +If you are using **Metafacture as a Java library**, just create a Metafix object and add it to your pipeline (see also the [Framework User Guide](#framework)). -PS: There is also the transformation modul MORPH but for that have a look at the old documentation and the german cookbook by Swissbib (LINKS). +The transformation itself is declared in a fix-object which can be a file. For more information on how to declare transformations see [Metafix User Guide](/Fix-User-Guide.md). + +> [!NOTE] +> PS: There is also the transformation modul MORPH but for that have a look at[ the old documentation](https://github.com/metafacture/metafacture-core/wiki/Metamorph-User-Guide) and the german cookbook by [Swissbib](https://swissbib.gitlab.io/metamorph-doku/). ## Framework +> [!NOTE] +>Relevant for developers + The framework includes the interfaces and abstract classes which form the foundation of the data processing pipelines. This part of Metafacture is only relevant for you if you plan to use Metafacture as a Java library or if you wish to add pipe elements to Flux. -For more information see the [Framework User Guide](https://github.com/TobiasNx/metafacture_documentation_new/blob/fcb2103acc3216dc39de5c6f05f2f481d4ec6126/Framework%20User%20Guid.md). +For more information see the [Framework User Guide](/Framework-User-Guide.md). From 4b0f27e4830c5103f8dda180829fab80eb8a2fa0 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 10:42:45 +0200 Subject: [PATCH 03/69] Update Approaching a transformation with metafacture.md Co-authored-by: Pascal Christoph --- Approaching a transformation with metafacture.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Approaching a transformation with metafacture.md b/Approaching a transformation with metafacture.md index 0a4d34a..eb06533 100644 --- a/Approaching a transformation with metafacture.md +++ b/Approaching a transformation with metafacture.md @@ -1,4 +1,4 @@ -Every approach transform metadata with metafacture is quite similiar: +Every approach to transform metadata with metafacture is quite similiar: - You need to know what you want: e.g. Transform data from Marc21 from a certain folder to some kind of JSON Data. From 9ca5a1184157df77c3bf083f57e8d048b9fc6885 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 10:43:17 +0200 Subject: [PATCH 04/69] Update Approaching a transformation with metafacture.md Co-authored-by: Pascal Christoph --- Approaching a transformation with metafacture.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Approaching a transformation with metafacture.md b/Approaching a transformation with metafacture.md index eb06533..480bda5 100644 --- a/Approaching a transformation with metafacture.md +++ b/Approaching a transformation with metafacture.md @@ -1,6 +1,6 @@ Every approach to transform metadata with metafacture is quite similiar: -- You need to know what you want: +- You need to know the type and source of the input and the type and destination of the output: e.g. Transform data from Marc21 from a certain folder to some kind of JSON Data. - You have to identify the commands that you need. - Combine the commands without the transformation module and test if the workflow goes trough. From 350309e7abf051af00810272e427e7a80f01138b Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 10:43:32 +0200 Subject: [PATCH 05/69] Update Approaching a transformation with metafacture.md Co-authored-by: Pascal Christoph --- Approaching a transformation with metafacture.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Approaching a transformation with metafacture.md b/Approaching a transformation with metafacture.md index 480bda5..cb05a58 100644 --- a/Approaching a transformation with metafacture.md +++ b/Approaching a transformation with metafacture.md @@ -3,7 +3,7 @@ Every approach to transform metadata with metafacture is quite similiar: - You need to know the type and source of the input and the type and destination of the output: e.g. Transform data from Marc21 from a certain folder to some kind of JSON Data. - You have to identify the commands that you need. -- Combine the commands without the transformation module and test if the workflow goes trough. +- Combine the commands without the transformation module and test if the workflow goes through. - Adjust the workflow until it works. - If the general workflow is working, move on to prepare the transformation. - Get familar with the incoming data: From 6b3942ad6fce5c4f521a7ed391cb706926da773e Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 10:43:56 +0200 Subject: [PATCH 06/69] Update Approaching a transformation with metafacture.md Co-authored-by: Pascal Christoph --- Approaching a transformation with metafacture.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Approaching a transformation with metafacture.md b/Approaching a transformation with metafacture.md index cb05a58..010d484 100644 --- a/Approaching a transformation with metafacture.md +++ b/Approaching a transformation with metafacture.md @@ -9,7 +9,7 @@ Every approach to transform metadata with metafacture is quite similiar: - Get familar with the incoming data: - e.g. use `| list-fix-paths| print` to checkout the metadata-element paths that are provided. - use `| list-fix-values ("specifiedElementPath")| print` to get all element values of a certain element -- Start to write your transformation successivly and `write` to a specific or `print` the result. +- Start to write your transformation successivly and `write` to a specific destination or `print` the result. - Start with one element that you want to transform an retain it. - If you are happy with the result continue. - If you have finalized your transformation include it in your application or transform the data you want. \ No newline at end of file From 1fd8f78093fc740d156dd5d7956cc473aa57b34d Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 10:44:09 +0200 Subject: [PATCH 07/69] Update Approaching a transformation with metafacture.md Co-authored-by: Pascal Christoph --- Approaching a transformation with metafacture.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Approaching a transformation with metafacture.md b/Approaching a transformation with metafacture.md index 010d484..c967518 100644 --- a/Approaching a transformation with metafacture.md +++ b/Approaching a transformation with metafacture.md @@ -10,6 +10,6 @@ Every approach to transform metadata with metafacture is quite similiar: - e.g. use `| list-fix-paths| print` to checkout the metadata-element paths that are provided. - use `| list-fix-values ("specifiedElementPath")| print` to get all element values of a certain element - Start to write your transformation successivly and `write` to a specific destination or `print` the result. - - Start with one element that you want to transform an retain it. + - Start with one element that you want to transform and retain it. - If you are happy with the result continue. - If you have finalized your transformation include it in your application or transform the data you want. \ No newline at end of file From 653e34a8af38cc87be8eaafdf832c21443d9f450 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 10:46:24 +0200 Subject: [PATCH 08/69] Update Fix-User-Guide.md Co-authored-by: Pascal Christoph --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 2c05aff..13a560b 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -23,7 +23,7 @@ Flux-Example: - - address the `fix`-module - - you can add variables - - there are some optiones available -- - The fix-Transformation can be part of the FLUX `|fix("retain(`245??`)")` - usually useful for short fixes +- - The Fix transformation can be part of the FLUX `|fix("retain(`245??`)")` - usually useful for short fixes - - or it can be separated in a external file, that is called in the FLUX-Process as in the code-snipped above - when using it in a JAVA-Process, just add the library to your process From 6d52a6e9aaa1ccfcff682b7342cceae88ad189f8 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 10:46:42 +0200 Subject: [PATCH 09/69] Update Fix-User-Guide.md Co-authored-by: Pascal Christoph --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 13a560b..40e2f32 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -24,7 +24,7 @@ Flux-Example: - - you can add variables - - there are some optiones available - - The Fix transformation can be part of the FLUX `|fix("retain(`245??`)")` - usually useful for short fixes -- - or it can be separated in a external file, that is called in the FLUX-Process as in the code-snipped above +- - or it can be separated in an external file, that is called in the FLUX-Process as in the code-snipped above - when using it in a JAVA-Process, just add the library to your process ## Record-based and metadata manipulating approach From e146ed80fe4c7d870a6815b0d06857694308d569 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 10:47:17 +0200 Subject: [PATCH 10/69] Update Fix-User-Guide.md Co-authored-by: Pascal Christoph --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 40e2f32..3c6bc2a 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -29,7 +29,7 @@ Flux-Example: ## Record-based and metadata manipulating approach While Metafature processes the data as a stream, the fix module does not it buffers the incoming stream to distinct records. -So that you can manipulate all metadata-elements of a record at once and without focussing about the order of the incoming stream - which was a really big hassle in the stream-based MORPH. +Thus you can manipulate all metadata-elements of a record at once and don't need to think about the order of the incoming stream - which was a really big hassle in the stream-based MORPH. The incoming record then can be manipulated, fields can be changed, removed or added. This also differs from the approach in MORPH where you construct a new record and a new data stream, whereas you change stuff in the record in FIX and "only" change the data stream in Metafacture. From de18548ce0615e1d7902026ec2da37250df43b8f Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 10:48:50 +0200 Subject: [PATCH 11/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 3c6bc2a..52a932f 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -2,7 +2,7 @@ # Fix User Guide -This document provides an introduction to the Metafacture Fix language (short: Metafix or Fix). Fix is a declarative flow oriented language in which transformations of arbitrary metadata/semi-structured data can be defined using the FIX language. The Fix language for Metafacture is introduced as an alternative to configuring data transformations with Metamorph. Inspired by Catmandu Fix, Metafix processes metadata not as a continuous data stream but as discrete records. The basic idea is to rebuild constructs from the (Catmandu) Fix language like functions, selectors and binds in Java and combine with additional functionalities from the Metamorph toolbox. +This document provides an introduction to the Metafacture Fix language (short: Metafix or Fix). The Fix language for Metafacture is introduced as an alternative to configuring data transformations with Metamorph. Inspired by Catmandu Fix, Metafix processes metadata not as a continuous data stream but as discrete records. ## Part of a metafacture worflow Metafacture Fix is a transformation module that can be used in a workflow, for this you have to use this in your pipeline: From 6f0a32fd362cab079bddba4e9bcc866dad237151 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 10:50:17 +0200 Subject: [PATCH 12/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 52a932f..cd9418d 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -25,7 +25,7 @@ Flux-Example: - - there are some optiones available - - The Fix transformation can be part of the FLUX `|fix("retain(`245??`)")` - usually useful for short fixes - - or it can be separated in an external file, that is called in the FLUX-Process as in the code-snipped above -- when using it in a JAVA-Process, just add the library to your process +- when using it in a Java process, just add the library to your process ## Record-based and metadata manipulating approach While Metafature processes the data as a stream, the fix module does not it buffers the incoming stream to distinct records. From 6335bc6a54e37adaa7f9ad1a3849cf8ab506ddec Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 10:51:03 +0200 Subject: [PATCH 13/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index cd9418d..5dc8f8d 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -74,7 +74,7 @@ end **Conditionals** are used to control the processing of function so that they are not process with every workflow but only under certain conditions. -**Selectors** can be used as hghlevel filter to filter the records you want. +**Selectors** can be used to filter the records you want. **Binds** are wrappers for one or more fixes. They give extra control functionality for fixes such as loops. All binds have the same syntax: From 6e99a9f9ada416183957a5be27e18fb5023f5570 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 10:51:31 +0200 Subject: [PATCH 14/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 5dc8f8d..88734d7 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -86,7 +86,7 @@ do Bind(params,…) end ``` -Find here a [list of all function, selectors, binds and conditionals](/Fix-function-and-Cookbook.md). +Find here a [list of all functions, selectors, binds and conditionals](/Fix-function-and-Cookbook.md). ## Addressing Pieces of Data - of: FIX-Path and the record structure in FIX From b6cb81fa803cf8b66c0b9b2bd001c1bced063d35 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 10:51:54 +0200 Subject: [PATCH 15/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 88734d7..6ff821b 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -198,7 +198,7 @@ In this case `target_field` and `source_field` serve as a parameter (the name is Parameters are scoped, which means that the ones provided with the `call_macro` function shadow global ones. Macros cannot be nested. -## Parameters to Metamorph Definitions / Using variables +## Parameters to Fix definitions / Using variables Fix definitions may contain parameters. They follow the pattern `$[NAME]`: From 7b7ba322c7b8c8a9cd670f1de9c52f9f611bc195 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 10:52:23 +0200 Subject: [PATCH 16/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 6ff821b..12ecfc1 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -217,7 +217,7 @@ The `` section in the Metamorph definition can be used to set defaults: ``` -For Java implementations: Compile-time variable are passed to Fix as a constructor parameter. +For Java implementations: Compile-time variables are passed to Fix as a constructor parameter. ```java final Map vars = new HashMap(); From 7294b838e903eef106b241525dfa0aaa7fd2b16a Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 10:53:43 +0200 Subject: [PATCH 17/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 12ecfc1..a4c4a73 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -232,7 +232,7 @@ final Metafix metafix = new metafix("fixdef.fix", vars); In a complex project setting there may be several Fixes in use, and it is likely that they share common parts. Imagine for instance a -transformations from Marc 21 record holding data on books to RDF, and Marc 21 +transformations from Marc 21 records holding data on books to RDF, and Marc 21 records hodling data on authors to RDF. Both make use of a table assinging country names to ISO country codes. Such a table should only exist once. From e5ad3400e62dbfe3536106e4ae5f98d5771b7e26 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 10:59:16 +0200 Subject: [PATCH 18/69] Update Home.md Co-authored-by: Pascal Christoph --- Home.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Home.md b/Home.md index ef76d07..74b87ab 100644 --- a/Home.md +++ b/Home.md @@ -43,7 +43,7 @@ If you are using **Metafacture as a Java library**, just create a Metafix object The transformation itself is declared in a fix-object which can be a file. For more information on how to declare transformations see [Metafix User Guide](/Fix-User-Guide.md). > [!NOTE] -> PS: There is also the transformation modul MORPH but for that have a look at[ the old documentation](https://github.com/metafacture/metafacture-core/wiki/Metamorph-User-Guide) and the german cookbook by [Swissbib](https://swissbib.gitlab.io/metamorph-doku/). +> PS: There is also the transformation modul MORPH. Have a look at[ the old documentation](https://github.com/metafacture/metafacture-core/wiki/Metamorph-User-Guide) and the german cookbook by [Swissbib](https://swissbib.gitlab.io/metamorph-doku/). ## Framework From 91e3e532ac1178cdb8d21f0e6681f133b0b51319 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:00:09 +0200 Subject: [PATCH 19/69] Update Home.md Co-authored-by: Pascal Christoph --- Home.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Home.md b/Home.md index 74b87ab..d708f9a 100644 --- a/Home.md +++ b/Home.md @@ -26,7 +26,7 @@ __________________ ## FLUX -Flux is a scripting language to easily build and run processing pipelines. No Java programming is necessary, just a command line. To use Flux you may download the binary distribution of Metafacture. +Flux is a scripting language to easily build and run processing pipelines. No Java programming is necessary - it's used as a command line. To use Flux you may download the binary distribution of Metafacture. For more information on how to use Flux, see the [Flux User Guide](/Flux-User-Guide.md). From 5278245f674885ba119fb7395e19cd7dd0970344 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:01:43 +0200 Subject: [PATCH 20/69] Update Home.md Co-authored-by: Pascal Christoph --- Home.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Home.md b/Home.md index d708f9a..fdbddbf 100644 --- a/Home.md +++ b/Home.md @@ -18,7 +18,7 @@ __________________ While working with the playground or the command line you only need [Flux](#flux) and the transformation module [Fix](#fix). Have a look here for [Getting started](/Getting-Started.md). -## Framework for JAVA-Integration/Development +## Framework for Java integration/development If you plan to use Metafacture as a Java library or if you wish to add commands to Flux. You should get familar with the [Framework](/Home.md#framework). From 1ad57d9912c383d4b937d48f7b511c4a8368dc02 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:02:21 +0200 Subject: [PATCH 21/69] Update Getting-Started.md Co-authored-by: Pascal Christoph --- Getting-Started.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Getting-Started.md b/Getting-Started.md index fc37e36..a4b0f08 100644 --- a/Getting-Started.md +++ b/Getting-Started.md @@ -52,7 +52,7 @@ dependencies { To use Fix you need to -Occasionally, we publish snapshot builds on [Sonatype OSS Repository](https://oss.sonatype.org/index.html#nexus-search;gav~org.metafacture~~~~). The version number is derived from the branch name. Snapshot builds from the master branch always have the version `master-SNAPSHOT`. We also provide sometimes pre releases as github packages. +Occasionally, we publish snapshot builds on [Sonatype OSS Repository](https://oss.sonatype.org/index.html#nexus-search;gav~org.metafacture~~~~~kw,versionexpand). The version number is derived from the branch name. Snapshot builds from the master branch always have the version `master-SNAPSHOT`. We also provide sometimes pre releases as github packages. If you plan to use Metafacture as a Java library or if you wish to add commands to Flux. You should get familar with the [Framework](/Framework-User-Guide.md). From deaa566d379cef19a9ac53a187394683dae241d5 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:09:14 +0200 Subject: [PATCH 22/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index a4c4a73..70cb848 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -233,7 +233,7 @@ final Metafix metafix = new metafix("fixdef.fix", vars); In a complex project setting there may be several Fixes in use, and it is likely that they share common parts. Imagine for instance a transformations from Marc 21 records holding data on books to RDF, and Marc 21 -records hodling data on authors to RDF. Both make use of a table assinging +records holding data on authors to RDF. Both make use of a table assigning country names to ISO country codes. Such a table should only exist once. Another scenario would be to reduce the size of a single fix file and create several fix files used for different purposes. From 039a4b061ccf68295470300896f30bc7d2882dcd Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:09:40 +0200 Subject: [PATCH 23/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 70cb848..3cf749d 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -249,4 +249,4 @@ do once("setup") end ``` -For perfomance reason it is useful to integrate macros and maps that are used often in an do once bind. \ No newline at end of file +For performance reasons it is useful to integrate macros and maps that are used often in a `do once` bind. \ No newline at end of file From 291a2cf669b9209621dc6e99dc31940c468a4a25 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:10:00 +0200 Subject: [PATCH 24/69] Update Flux-User-Guide.md Co-authored-by: Pascal Christoph --- Flux-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Flux-User-Guide.md b/Flux-User-Guide.md index 5de39d5..1c28ad2 100644 --- a/Flux-User-Guide.md +++ b/Flux-User-Guide.md @@ -88,7 +88,7 @@ Furthermore, some commands have named options. These are set as follows `command To learn about the available options of a command, execute Flux without arguments: It will list all available commands, including options. or have a look at: [List of available FLUX commands.](https://github.com/metafacture/metafacture-documentation/blob/master/flux-commands.md) TODO: We need to add FIX to that list!!! -To some commands, the entire environment can be given as argument. This is done with the `*` character: `fix("tranformation.fix", *)`. In this case Metafix gains access to all variable assignments made in Flux. +To some commands the entire environment can be given as an argument. This is done with the `*` character: `fix("tranformation.fix", *)`. In this case Metafix gains access to all variable assignments made in Flux. (See also [[Metafix-User-Guide#parameters-to-metafix-definitions]]). Note that unlike shell pipes, the data flowing between Flux commands is _typed_. This means that only commands with matching signatures can be combined. Commands expect a certain input and provide a certain output like: `StreamReceiver, `Object`, `Reader` and others. From af3c0ad08f3d37995aa497f4f20bd7239ab5c63d Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:10:50 +0200 Subject: [PATCH 25/69] Update Fix-User-Guide.md Co-authored-by: Pascal Christoph --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 3cf749d..50aaafa 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -91,7 +91,7 @@ Find here a [list of all functions, selectors, binds and conditionals](/Fix-func ## Addressing Pieces of Data - of: FIX-Path and the record structure in FIX -Internally FIX knows arrays, objects/hashes and simple elements. How a format is translated is depending on the `decode-...` command in the MF Workflow. Only one thing is specific to the fix, as in Catmandu a repeated field is translated into a list depending on the real input data of the single record and elements with the suffix `[]` are interpreted as arrays. +Internally FIX knows arrays, objects/hashes and simple elements. How a format is translated depends on the `decode-...` command in the MF Workflow. Only one thing is specific to the fix, as in Catmandu: a repeated field is translated into a list depending on the real input data of the single record. Elements with the suffix `[]` are interpreted as arrays. Since function manipulate, add or remove elements in a record, it is essential to understand the way on can adress source or target elements. From 9f8c9efb23ee9163c6a629d741ab519d5994140d Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:11:18 +0200 Subject: [PATCH 26/69] Update Getting-Started.md Co-authored-by: Pascal Christoph --- Getting-Started.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Getting-Started.md b/Getting-Started.md index a4b0f08..bd647bc 100644 --- a/Getting-Started.md +++ b/Getting-Started.md @@ -55,4 +55,4 @@ To use Fix you need to Occasionally, we publish snapshot builds on [Sonatype OSS Repository](https://oss.sonatype.org/index.html#nexus-search;gav~org.metafacture~~~~~kw,versionexpand). The version number is derived from the branch name. Snapshot builds from the master branch always have the version `master-SNAPSHOT`. We also provide sometimes pre releases as github packages. -If you plan to use Metafacture as a Java library or if you wish to add commands to Flux. You should get familar with the [Framework](/Framework-User-Guide.md). +If you plan to use Metafacture as a Java library or if you wish to add commands to Flux you should get familar with the [Framework](/Framework-User-Guide.md). From feb3fa49b1d6ace95c816a4361b3c5ebe3a10aa1 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:12:47 +0200 Subject: [PATCH 27/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 50aaafa..0a7d5e9 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -89,7 +89,7 @@ end Find here a [list of all functions, selectors, binds and conditionals](/Fix-function-and-Cookbook.md). -## Addressing Pieces of Data - of: FIX-Path and the record structure in FIX +## Addressing Pieces of Data: FIX-Path and the record structure in FIX Internally FIX knows arrays, objects/hashes and simple elements. How a format is translated depends on the `decode-...` command in the MF Workflow. Only one thing is specific to the fix, as in Catmandu: a repeated field is translated into a list depending on the real input data of the single record. Elements with the suffix `[]` are interpreted as arrays. From 70a252ed727926423f2e651756735a89ebe24c36 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:13:16 +0200 Subject: [PATCH 28/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 0a7d5e9..9b15fc8 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -126,7 +126,7 @@ For the fields with deeper structure you add a dot ‘.’. The path for element In an data set an element sometimes an element can have multiple instances. Different data models solve this possibility differently. XML-Records can have all elements multiple times, element repition is possible and in many schemas (partly) allowed. Repeatable elements also exist in JSON and YAML but are unusual. -To point to a specific element you state the index number. To adress the value `repeatedField2` the path would be `f.2`` since the repeated field is handled as list. +To point to a specific element you state the index number. To adress the value `repeatedField2` the path would be `f.2`` since the repeated field is handled as a list. Similar you adress the `listElement3` of the array/list by `g[].3`. The brackets are an array indicator created by the flux command decode-yaml. It helps to interpret an element as array also if the list has only on value. Working with nested structures and combination of arrays and objects the path is a combination of element names, dots and index numbers. From a8696f1078038572bef3f7976a518cb0a6bb4719 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:13:44 +0200 Subject: [PATCH 29/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 9b15fc8..953f66b 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -100,7 +100,7 @@ e.g.: copy_field("", "") ``` -To adress the source or target element here, you need to provide the path to the element. +To address the source or target element here, you need to provide the path to the element. Metafacture Fix is using a path-syntax that is JSON Path like but not identical. It also uses the dot notation but there are some differences with the path structure of arrays and repeated fields. Especially when working with JSON, YAML or records repeated fields. ``` From 250467d2a11ed9edad6716c0e9caf8c2dc9128e3 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:14:32 +0200 Subject: [PATCH 30/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 953f66b..b976382 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -150,7 +150,7 @@ z : To adress paths you can use wildcards. For instance the star-wildcard: `person*` would match all simple literals with element names starting with 'person': 'person\_name', 'person\_age', etc. -Apart from the star-wildcard, the questionmark-wildcard ('?') is supported. It matches exactly one arbitrary character. +Apart from the `*` wildcard, the `?` wildcard is supported. It matches exactly one arbitrary character. Not fully supported yet is alteration of pathes. From 83e14163da42d71ce6184fd8282e4c0c4339543a Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:15:18 +0200 Subject: [PATCH 31/69] Update Flux-User-Guide.md Co-authored-by: Pascal Christoph --- Flux-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Flux-User-Guide.md b/Flux-User-Guide.md index 1c28ad2..4fd5098 100644 --- a/Flux-User-Guide.md +++ b/Flux-User-Guide.md @@ -127,6 +127,6 @@ There are 4 annotations, see this [example](https://github.com/metafacture/metaf @Description("A MAB XML reader") @In(XmlReceiver.class) @Out(StreamReceiver.class) -@FluxCommand("handle-mabxml")morph +@FluxCommand("handle-mabxml") ``` If you add a command it would be nice if you also add a flux example to the module `metafacture-runner` so that users can easily see how it's used, see e.g. https://github.com/metafacture/metafacture-core/blob/master/metafacture-runner/src/main/dist/examples/read/regexp/regexp.flux. From 0065f867c0db58ce622dd783e6cdf559a0476409 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:15:56 +0200 Subject: [PATCH 32/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index b976382..29baed4 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -133,7 +133,7 @@ Working with nested structures and combination of arrays and objects the path is `listObjectElement2.2` has the path: `h[].2.j` -You do not only need the path name for your source element but also if you want to create a new element. But remember that fix as in catmandu is using repeated fields and arrays as lists so if you want to create a repeated field you have to create an array without suffic []. +You do not only need the path name for your source element but also if you want to create a new element. But remember that fix, as in catmandu, is using repeated fields and arrays as lists so if you want to create a repeated field you have to create an array without suffix []. e.g.: ```PERL From 3ebde206c20b7b73c29dea4735e12d6bf9c14c9e Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:16:12 +0200 Subject: [PATCH 33/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 29baed4..7942a21 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -93,7 +93,7 @@ Find here a [list of all functions, selectors, binds and conditionals](/Fix-func Internally FIX knows arrays, objects/hashes and simple elements. How a format is translated depends on the `decode-...` command in the MF Workflow. Only one thing is specific to the fix, as in Catmandu: a repeated field is translated into a list depending on the real input data of the single record. Elements with the suffix `[]` are interpreted as arrays. -Since function manipulate, add or remove elements in a record, it is essential to understand the way on can adress source or target elements. +Since functions manipulate, add or remove elements in a record, it is essential to understand the way you can address source or target elements. e.g.: ```PERL From 38a5890ec66d5454f8c9557f7aca22085cd502d7 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:16:31 +0200 Subject: [PATCH 34/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 7942a21..01d6848 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -164,7 +164,7 @@ e.g.: if you transform MARC21 to JSON but you want to keep only certain elements ``` retain("all", - element", + elements", "that", "I", "want") From ac368b05d655037ae2088ff70c1883dccec6bade Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:16:53 +0200 Subject: [PATCH 35/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 01d6848..9bd2afd 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -127,7 +127,7 @@ For the fields with deeper structure you add a dot ‘.’. The path for element In an data set an element sometimes an element can have multiple instances. Different data models solve this possibility differently. XML-Records can have all elements multiple times, element repition is possible and in many schemas (partly) allowed. Repeatable elements also exist in JSON and YAML but are unusual. To point to a specific element you state the index number. To adress the value `repeatedField2` the path would be `f.2`` since the repeated field is handled as a list. -Similar you adress the `listElement3` of the array/list by `g[].3`. The brackets are an array indicator created by the flux command decode-yaml. It helps to interpret an element as array also if the list has only on value. +Similarly you address the `listElement3` of the array/list by `g[].3`. The brackets are an array indicator created by the flux command decode-yaml. It helps to interpret an element as array even if the list has only on value. Working with nested structures and combination of arrays and objects the path is a combination of element names, dots and index numbers. From af1483764bad82516e789ba9d6e43f45aa1e77b4 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:17:16 +0200 Subject: [PATCH 36/69] Update Framework-User-Guide.md Co-authored-by: Pascal Christoph --- Framework-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Framework-User-Guide.md b/Framework-User-Guide.md index 84e891d..3340f10 100644 --- a/Framework-User-Guide.md +++ b/Framework-User-Guide.md @@ -16,7 +16,7 @@ The individual elements are connected by calling the `setReceiver()` method. The ```java // create necessary objects final PicaReader reader = new PicaReader(); -final Metafix metafix = new Metafix("defnition.fix"); +final Metafix metafix = new Metafix("definition.fix"); final ListMapWriter writer = new ListMapWriter(); //connect them From 2dee362306bc7a3a02a0139ffbac8acb625244c9 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:17:57 +0200 Subject: [PATCH 37/69] Update Flux-User-Guide.md Co-authored-by: Pascal Christoph --- Flux-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Flux-User-Guide.md b/Flux-User-Guide.md index 4fd5098..97176ce 100644 --- a/Flux-User-Guide.md +++ b/Flux-User-Guide.md @@ -108,7 +108,7 @@ Flux supports single line C/Java-style comments: `//comment`. ## Getting Help and Inspiration (TODO: Ersetzen.) -1. Have a look at the [List of available FLUX commands.](https://github.com/metafacture/metafacture-documentation/blob/master/flux-commands.md) or if the flux executed without arguments, Flux will display a short help text along with a list of all registered commands. This is the list of FLUX commands mentioned already above. +1. Have a look at the [List of available FLUX commands](https://github.com/metafacture/metafacture-documentation/blob/master/flux-commands.md) or execute the flux without arguments to get a short help text along with a list of all registered commands. This is the list of FLUX commands mentioned already above. 2. There are several example flux files along with sample data in the folder `examples/`: https://github.com/metafacture/metafacture-core/tree/master/metafacture-runner/src/main/dist/examples _________________________ From 833b42013f56fb523649df984f00d6c97849cf66 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:18:34 +0200 Subject: [PATCH 38/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 9bd2afd..620d6da 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -101,7 +101,7 @@ copy_field("", "") ``` To address the source or target element here, you need to provide the path to the element. -Metafacture Fix is using a path-syntax that is JSON Path like but not identical. It also uses the dot notation but there are some differences with the path structure of arrays and repeated fields. Especially when working with JSON, YAML or records repeated fields. +Metafacture Fix uses a path syntax that is JSON-Path-like but not identical. It also uses the dot notation but there are some differences with the path structure of arrays and repeated fields. Especially when working with JSON, YAML, or records with repeated fields. ``` a : simpleField From 0677e65d47072770a4a264f3a7ba4844a0defeee Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:19:23 +0200 Subject: [PATCH 39/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 620d6da..2ded194 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -124,7 +124,7 @@ k : l : m : o : deepNestedField The path for a simple string-element is adressed by stating the element name: `a` For the fields with deeper structure you add a dot ‘.’. The path for elements in nested objects is stated by: `b.c` or `k.l.m.o` -In an data set an element sometimes an element can have multiple instances. Different data models solve this possibility differently. XML-Records can have all elements multiple times, element repition is possible and in many schemas (partly) allowed. Repeatable elements also exist in JSON and YAML but are unusual. +Sometimes an element can have multiple instances. Different data models solve this possibility differently. In XML records element repetition is possible and (partly) allowed in many schemas. Repeatable elements also exist in JSON and YAML but are unusual. To point to a specific element you state the index number. To adress the value `repeatedField2` the path would be `f.2`` since the repeated field is handled as a list. Similarly you address the `listElement3` of the array/list by `g[].3`. The brackets are an array indicator created by the flux command decode-yaml. It helps to interpret an element as array even if the list has only on value. From 4b0e0dfa9a030ef21a6d499dd8ff7c8154fe3db8 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:19:49 +0200 Subject: [PATCH 40/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 2ded194..afe0776 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -129,7 +129,7 @@ Sometimes an element can have multiple instances. Different data models solve th To point to a specific element you state the index number. To adress the value `repeatedField2` the path would be `f.2`` since the repeated field is handled as a list. Similarly you address the `listElement3` of the array/list by `g[].3`. The brackets are an array indicator created by the flux command decode-yaml. It helps to interpret an element as array even if the list has only on value. -Working with nested structures and combination of arrays and objects the path is a combination of element names, dots and index numbers. +When working with nested structures and combinations of arrays and objects the path is a combination of element names, dots and index numbers. `listObjectElement2.2` has the path: `h[].2.j` From 19b190d6b54a2d977d5ed790967a8a0a9b6835d6 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:24:11 +0200 Subject: [PATCH 41/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index afe0776..456e592 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -121,7 +121,7 @@ h : - i : listObjectElement1.1 k : l : m : o : deepNestedField ``` -The path for a simple string-element is adressed by stating the element name: `a` +The path for a simple string element is addressed by stating the element name: `a` For the fields with deeper structure you add a dot ‘.’. The path for elements in nested objects is stated by: `b.c` or `k.l.m.o` Sometimes an element can have multiple instances. Different data models solve this possibility differently. In XML records element repetition is possible and (partly) allowed in many schemas. Repeatable elements also exist in JSON and YAML but are unusual. From 8a92e39a75100b45c09cee3be99c9c1b02a93244 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:25:05 +0200 Subject: [PATCH 42/69] Update Flux-User-Guide.md Co-authored-by: Pascal Christoph --- Flux-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Flux-User-Guide.md b/Flux-User-Guide.md index 97176ce..895ea8c 100644 --- a/Flux-User-Guide.md +++ b/Flux-User-Guide.md @@ -85,7 +85,7 @@ The syntax for defining flows takes its cues from bash pipes. Commands are conca Some commands take a constructor argument. It is provided within brackets: `command("arg")`. Furthermore, some commands have named options. These are set as follows `command(optionname="arg1",annotheroption="arg2")` or with constructor argument: `command("arg",option="arg2")`. -To learn about the available options of a command, execute Flux without arguments: It will list all available commands, including options. or have a look at: [List of available FLUX commands.](https://github.com/metafacture/metafacture-documentation/blob/master/flux-commands.md) TODO: We need to add FIX to that list!!! +To learn about the available options of a command, execute Flux without arguments - it will list all available commands, including options. Or simply have a look at the [list of available FLUX commands.](https://github.com/metafacture/metafacture-documentation/blob/master/flux-commands.md) To some commands the entire environment can be given as an argument. This is done with the `*` character: `fix("tranformation.fix", *)`. In this case Metafix gains access to all variable assignments made in Flux. From aeffe64ab35aa840d2b1e06cf45987d8db10ebec Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:27:23 +0200 Subject: [PATCH 43/69] Update Getting-Started.md Co-authored-by: Pascal Christoph --- Getting-Started.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Getting-Started.md b/Getting-Started.md index bd647bc..f4a8f9a 100644 --- a/Getting-Started.md +++ b/Getting-Started.md @@ -46,7 +46,7 @@ or if Gradle is your build tool of choice use: ```groovy dependencies { - implementation 'org.metafacture:metamorph:$VERSION' + implementation 'org.metafacture:metafacture-io:$VERSION' } ``` From ce1e95d393adac1b455759325966b27f88791591 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:27:45 +0200 Subject: [PATCH 44/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 456e592..a889cf1 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -154,7 +154,7 @@ Apart from the `*` wildcard, the `?` wildcard is supported. It matches exactly o Not fully supported yet is alteration of pathes. -Besides path wildcards there are array/list wildcards that are used to refrence specific elements or all elements in an array. `g[].*` adresses all strings in the array `g[]`. `g[].$append` would refrence a new element in the array at the end of the array. `g[].$last` refrences the last element in an array. +Besides path wildcards there are array/list wildcards that are used to reference specific elements or all elements in an array. `g[].*` addresses all strings in the array `g[]`. `g[].$append` would reference a new element in the array at the end of the array. `g[].$last` references the last element in an array. ## Cleaning up the transformation From 3a3dc5fe77bfa29d388feb075b14a1a4ef12f223 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:28:07 +0200 Subject: [PATCH 45/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index a889cf1..59d7a40 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -158,7 +158,7 @@ Besides path wildcards there are array/list wildcards that are used to reference ## Cleaning up the transformation -Since FIX is not constructing a new record stream but is manipulating the existing record you usually clean up after you transform the data. There are functions to kick out all unnecessary elements an kick out all empty elements. +Since FIX is not constructing a new record stream but is manipulating the existing record you usually clean up after you transform the data. There are functions to remove all unnecessary elements and to remove all empty elements. e.g.: if you transform MARC21 to JSON but you want to keep only certain elements that you created. you state them in a retain function: From 9e0ff191fc6dbaf10b1d67bb80dea07b649e3af7 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:28:31 +0200 Subject: [PATCH 46/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 59d7a40..7d5846d 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -160,7 +160,7 @@ Besides path wildcards there are array/list wildcards that are used to reference Since FIX is not constructing a new record stream but is manipulating the existing record you usually clean up after you transform the data. There are functions to remove all unnecessary elements and to remove all empty elements. -e.g.: if you transform MARC21 to JSON but you want to keep only certain elements that you created. you state them in a retain function: +e.g.: if you transform MARC21 to JSON but you want to keep only certain elements that you created, you state them in a `retain` function: ``` retain("all", From 9afa67e09ed6226b7c3d5bf47e8d4324c096c8c0 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:29:05 +0200 Subject: [PATCH 47/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 7d5846d..80ebd00 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -169,7 +169,7 @@ retain("all", "I", "want") ``` -This function only keeps all the elements that I wanted. At the moment this only works with highlevel elements. +This function only keeps all the elements that I wanted. At the moment this only works with top-level elements. `vacuum()` deletes all emtpy elements. From 88b0aa955acb258ca507cd0854f35ad251fb049a Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:29:24 +0200 Subject: [PATCH 48/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 80ebd00..ff545c4 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -171,7 +171,7 @@ retain("all", ``` This function only keeps all the elements that I wanted. At the moment this only works with top-level elements. -`vacuum()` deletes all emtpy elements. +`vacuum()` deletes all empty elements. ## Defining Macros From 1b304f76171042da3a4665be917cb00b381c44ce Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:33:28 +0200 Subject: [PATCH 49/69] Update Getting-Started.md Co-authored-by: Pascal Christoph --- Getting-Started.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Getting-Started.md b/Getting-Started.md index f4a8f9a..9db6cce 100644 --- a/Getting-Started.md +++ b/Getting-Started.md @@ -37,7 +37,7 @@ TODO: For instance, if you want to use Metamorph in your project, simply add the ```xml org.metafacture - metamorph + metafacture-io $VERSION ``` From 522c3002091fd8e5c7f1cee068199169e8d4cdd8 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:33:57 +0200 Subject: [PATCH 50/69] Update Fix-User-Guide.md Co-authored-by: Fabian Steeg --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index ff545c4..5df9794 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -194,7 +194,7 @@ call_macro("concat-up", source_field:"data1", target_field:"Data1") call_macro("concat-up", source_field:"data2", target_field:"Data2") `````` -In this case `target_field` and `source_field` serve as a parameter (the name is arbitrary). In the macro definition itsel, the parameters are addressed by `$[target_field]` and `$[source_field]`. +In this case `target_field` and `source_field` serve as a parameter (the name is arbitrary). In the macro definition itself, the parameters are addressed by `$[target_field]` and `$[source_field]`. Parameters are scoped, which means that the ones provided with the `call_macro` function shadow global ones. Macros cannot be nested. From 8f12f4fe51560cc94128d6cfb984feb40fe10359 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:34:29 +0200 Subject: [PATCH 51/69] Update Flux-User-Guide.md Co-authored-by: Pascal Christoph --- Flux-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Flux-User-Guide.md b/Flux-User-Guide.md index 895ea8c..d8e2f59 100644 --- a/Flux-User-Guide.md +++ b/Flux-User-Guide.md @@ -93,7 +93,7 @@ To some commands the entire environment can be given as an argument. This is don Note that unlike shell pipes, the data flowing between Flux commands is _typed_. This means that only commands with matching signatures can be combined. Commands expect a certain input and provide a certain output like: `StreamReceiver, `Object`, `Reader` and others. -To lookup the signatures, execute Flux without arguments or see: [[Metafix-User-Guide#parameters-to-metafix-definitions]]). It will list all available commands, including signatures. +To lookup the signatures, again: execute Flux without arguments or see: [[Metafix-User-Guide#parameters-to-metafix-definitions]]). It will list all available commands, including signatures. Or simply have a look at the [list of available FLUX commands.](https://github.com/metafacture/metafacture-documentation/blob/master/flux-commands.md) ### Variables Variables are always Strings and can be concatenated with the `+` operator. Escape sequences follow the Java String conventions: `\n`=line break, `\t`=tab, `\\`=\, `\u0024`=unicode character, etc. From 6e9cf7a4d661297f4faaf7c914039516f8237cb6 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:35:04 +0200 Subject: [PATCH 52/69] Update Getting-Started.md Co-authored-by: Pascal Christoph --- Getting-Started.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Getting-Started.md b/Getting-Started.md index 9db6cce..0b6e3da 100644 --- a/Getting-Started.md +++ b/Getting-Started.md @@ -32,7 +32,7 @@ If you want to use Metafacture in your own Java projects all you need is to add Alternatively, you can simply guess the package names from the top-level folders in the source code repository -- they are the same. -TODO: For instance, if you want to use Metamorph in your project, simply add the following dependency to your `pom.xml`: +For instance, if you want to use the `metafacture-io` library in your project, simply add the following dependency to your `pom.xml`: ```xml From 90340ea127caa146d88d84ca658915406c82f73d Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:35:47 +0200 Subject: [PATCH 53/69] Update Getting-Started.md Co-authored-by: Pascal Christoph --- Getting-Started.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Getting-Started.md b/Getting-Started.md index 0b6e3da..d43569c 100644 --- a/Getting-Started.md +++ b/Getting-Started.md @@ -24,7 +24,7 @@ To get started, you can export a workflow from the Playground (last button, !["E To set up IDE support for editing your Flux and Fix files, see [the IDE extensions page](https://metafacture.org/ide-extensions/index.html). -For next steps get familar with FLUX (hyper link) and FIX(hyper link). And try out some metafacture workflows. +For next steps get familar with [FLUX](/Flux-User-Guide.md) and [FIX](/Fix-User-Guide.md). And try out some Metafacture workflows. ## Using Metafacture as a Java library From d5ecf6a49faaf5af53c270c4bef41e27de3c437b Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:37:51 +0200 Subject: [PATCH 54/69] Update Framework-User-Guide.md Co-authored-by: Pascal Christoph --- Framework-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Framework-User-Guide.md b/Framework-User-Guide.md index 3340f10..d0048c5 100644 --- a/Framework-User-Guide.md +++ b/Framework-User-Guide.md @@ -3,7 +3,7 @@ # Framework User Guide > [!NOTE] ->Relevant for JAVA developers. For using metafacture without JAVA Code see the [FLUX user guide](/Flux-User-Guide.md). +>Relevant for Java developers. For using metafacture without Java Code see the [FLUX user guide](/Flux-User-Guide.md). This page explains how to create a Metafacture objects and how to assemble them to form a processing pipeline. We use as an example a simple pipeline containing a Metamorph instance. From d7282988264c74285f9214291df7cab42eb27d05 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:58:55 +0200 Subject: [PATCH 55/69] Update Getting-Started.md Co-authored-by: Pascal Christoph --- Getting-Started.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Getting-Started.md b/Getting-Started.md index d43569c..5462d4c 100644 --- a/Getting-Started.md +++ b/Getting-Started.md @@ -11,7 +11,7 @@ For commands available in the Flux, see [the Flux commands documentation](/flux- For functions and usage of the Fix, see [the Fix functions and cookbook](/Fix-functions-and-cookbook). -For next steps get familar with [FLUX](/Flux-User-Guide.md) and [FIX](/Fix-User-Guide.md). And try out some metafacture workflows. +For next steps get familar with [FLUX](/Flux-User-Guide.md) and [FIX](/Fix-User-Guide.md). And try out some Metafacture workflows. ## Command line From 01b6f393bc8605f5bf38afe07bf9ba3174b97396 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 11:59:22 +0200 Subject: [PATCH 56/69] Update Framework-User-Guide.md Co-authored-by: Pascal Christoph --- Framework-User-Guide.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Framework-User-Guide.md b/Framework-User-Guide.md index d0048c5..54b6a04 100644 --- a/Framework-User-Guide.md +++ b/Framework-User-Guide.md @@ -43,6 +43,8 @@ reader.setReceiver(new LogPipe()).setReceiver(metafix).setReceiver(writer); //adding a tee junction reader.setReceiver(new Tee()).setReceivers(writer1, writer2); +// create e.g. three threads +reader.setReceiver(new ObjectThreader<>()).addReceiver(...).addReceiver(...).addReceiver(...); //splitting based on a metamorph description final Splitter splitter = new Splitter("morph/typeSplitter.xml"); reader.setReceiver(splitter).setReceiver("Tn", writer1); From 1c3c8817512d95afa1725c1fde00e3e546033ec0 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 12:00:00 +0200 Subject: [PATCH 57/69] Update Framework-User-Guide.md Co-authored-by: Pascal Christoph --- Framework-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Framework-User-Guide.md b/Framework-User-Guide.md index 54b6a04..f64a25f 100644 --- a/Framework-User-Guide.md +++ b/Framework-User-Guide.md @@ -30,7 +30,7 @@ Note that the call `setReceiver()` returns its argument, preserving the respective type. Thus the calls can be chained to build up a pipeline as shown in the listing. Finally the processing is started by calling the respective method on the data source/reader. The method name -depends on the reader. In the Metamorph project `read()` is used by +depends on the reader. In the Metafacture project `read()` is used by convention. The following code snippet shows a few more sophisticated connection patterns, such From bd740cbf625a77639c253fe5d15251f500fb2007 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 12:00:28 +0200 Subject: [PATCH 58/69] Update Framework-User-Guide.md Co-authored-by: Pascal Christoph --- Framework-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Framework-User-Guide.md b/Framework-User-Guide.md index f64a25f..67f5a12 100644 --- a/Framework-User-Guide.md +++ b/Framework-User-Guide.md @@ -5,7 +5,7 @@ > [!NOTE] >Relevant for Java developers. For using metafacture without Java Code see the [FLUX user guide](/Flux-User-Guide.md). -This page explains how to create a Metafacture objects and how to assemble them to form a processing pipeline. We use as an example a simple pipeline containing a Metamorph instance. +This page explains how to create a Metafacture objects and how to assemble them to form a processing pipeline. We use as an example a simple pipeline containing a Metafix instance. ## Building a Flow From 7332013aa82a5642fbd7e6d005801b922f1c247e Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 12:12:47 +0200 Subject: [PATCH 59/69] Update Getting-Started.md Co-authored-by: Pascal Christoph --- Getting-Started.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Getting-Started.md b/Getting-Started.md index 5462d4c..6cab33a 100644 --- a/Getting-Started.md +++ b/Getting-Started.md @@ -50,7 +50,7 @@ dependencies { } ``` -To use Fix you need to +To use Fix you would declare `metafix` instead of `metafacture-io` in the example above. Note that `metafix` is not published to maven central but only to [github releases](https://github.com/metafacture/metafacture-fix/releases). Occasionally, we publish snapshot builds on [Sonatype OSS Repository](https://oss.sonatype.org/index.html#nexus-search;gav~org.metafacture~~~~~kw,versionexpand). The version number is derived from the branch name. Snapshot builds from the master branch always have the version `master-SNAPSHOT`. We also provide sometimes pre releases as github packages. From 97261a8fd1490e06899f6d8f21566b89669824e1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tobias=20B=C3=BClte?= Date: Wed, 30 Aug 2023 13:06:10 +0200 Subject: [PATCH 60/69] Update documentation pages --- ...ching a transformation with metafacture.md | 2 +- Documentation-Maintainer-Guide.md | 48 ++++++++ Fix-User-Guide.md | 9 +- Getting-Started.md | 4 +- Home.md | 56 --------- README.md | 115 ++++++++---------- flux-commands.md | 30 ++++- 7 files changed, 133 insertions(+), 131 deletions(-) create mode 100644 Documentation-Maintainer-Guide.md delete mode 100644 Home.md diff --git a/Approaching a transformation with metafacture.md b/Approaching a transformation with metafacture.md index c967518..cbc326a 100644 --- a/Approaching a transformation with metafacture.md +++ b/Approaching a transformation with metafacture.md @@ -12,4 +12,4 @@ Every approach to transform metadata with metafacture is quite similiar: - Start to write your transformation successivly and `write` to a specific destination or `print` the result. - Start with one element that you want to transform and retain it. - If you are happy with the result continue. -- If you have finalized your transformation include it in your application or transform the data you want. \ No newline at end of file +- If you have finalized your transformation include it in your application or transform the data you want for single reuse. \ No newline at end of file diff --git a/Documentation-Maintainer-Guide.md b/Documentation-Maintainer-Guide.md new file mode 100644 index 0000000..735cd63 --- /dev/null +++ b/Documentation-Maintainer-Guide.md @@ -0,0 +1,48 @@ + +## how to change flux-commands.md + +The entries in flux-commands.md describe the usage of commands used by flux. +flux-commands.md is fully automatically generated. To make this happen one has to +fill in the proper annotations in the correponding java classes. E.g. + +``` +reset-object-batch +------------------ +- description: Resets the downstream modules every batch-size objects +- options: batchsize (int) +- signature: Object -> Object +- java class: org.metafacture.flowcontrol.ObjectBatchResetter +``` + +is generated by reading following annotations in [ObjectBatchResetter.java](https://github.com/metafacture/metafacture-core/blob/511b4af8b993c85a33d6a18322258a195684d133/metafacture-flowcontrol/src/main/java/org/metafacture/flowcontrol/ObjectBatchResetter.java): + +``` +@Description("Resets the downstream modules every batch-size objects") +@FluxCommand("reset-object-batch") +@In(Object.class) +@Out(Object.class) +``` +The description of "options" is produced from all "public setter-methods", in this case: +``` + public void setBatchSize(final int batchSize) { ... +``` +The option's name is produced by cutting away the "set" from the methods name, leaving +"BatchSize" which is then lowercased. The parameter of this option is generated from the +parameter type of the method - here an "int"eger. + +## how to publish flux-commands.md + +If you have updated some of these annotations, say "description", and these changes are +merged into the master branch, generate a new flux-commands.md like this: + +Go to metafacture-core, checkout master and build a distribution and start flux.sh: +```bash +$ ./gradlew installDist +$ cd ./metafacture-runner/build/install/metafacture-core/ +$ flux.sh > flux-commands.md +``` + +Open the generated flux-commands.md and remove some boilerplate at the beginning of the +file manually. Save it, copy it here, commit and push. + +The [publishing process will be automated with an github action](https://github.com/metafacture/metafacture-core/issues/368). diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 5df9794..9731477 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -5,7 +5,7 @@ This document provides an introduction to the Metafacture Fix language (short: Metafix or Fix). The Fix language for Metafacture is introduced as an alternative to configuring data transformations with Metamorph. Inspired by Catmandu Fix, Metafix processes metadata not as a continuous data stream but as discrete records. ## Part of a metafacture worflow -Metafacture Fix is a transformation module that can be used in a workflow, for this you have to use this in your pipeline: +Metafacture Fix is a transformation module that can be used in a [Flux Workflow](/Flux-User-Guide.md), for this you have to use this in your pipeline: Flux-Example: ```PERL @@ -22,15 +22,14 @@ Flux-Example: - when using the FLUX: - - address the `fix`-module - - you can add variables -- - there are some optiones available - - The Fix transformation can be part of the FLUX `|fix("retain(`245??`)")` - usually useful for short fixes - - or it can be separated in an external file, that is called in the FLUX-Process as in the code-snipped above - when using it in a Java process, just add the library to your process ## Record-based and metadata manipulating approach -While Metafature processes the data as a stream, the fix module does not it buffers the incoming stream to distinct records. +While Metafature processes the data as a stream, the fix module does not, it buffers the incoming stream to distinct records. Thus you can manipulate all metadata-elements of a record at once and don't need to think about the order of the incoming stream - which was a really big hassle in the stream-based MORPH. -The incoming record then can be manipulated, fields can be changed, removed or added. This also differs from the approach in MORPH where you construct a new record and a new data stream, whereas you change stuff in the record in FIX and "only" change the data stream in Metafacture. +The incoming record then can be manipulated, fields can be changed, removed or added. This also differs from the approach in the other Transformation Module MORPH where you construct a new record and a new data stream. With FIX you change stuff in the record and "only" change the data stream in Metafacture. ## Basic concepts @@ -149,7 +148,7 @@ z : ``` -To adress paths you can use wildcards. For instance the star-wildcard: `person*` would match all simple literals with element names starting with 'person': 'person\_name', 'person\_age', etc. +To address paths you can use wildcards. For instance the star-wildcard: `person*` would match all simple literals with element names starting with 'person': 'person\_name', 'person\_age', etc. Apart from the `*` wildcard, the `?` wildcard is supported. It matches exactly one arbitrary character. Not fully supported yet is alteration of pathes. diff --git a/Getting-Started.md b/Getting-Started.md index 6cab33a..c8aeb18 100644 --- a/Getting-Started.md +++ b/Getting-Started.md @@ -15,7 +15,7 @@ For next steps get familar with [FLUX](/Flux-User-Guide.md) and [FIX](/Fix-User- ## Command line -To use Metafacture as a command-line tool, download the latest metafix-runner from our [releases page](https://github.com/metafacture/metafacture-fix/releases). Extract the downloaded archive and change into the newly created directory (e.g. `cd metafacture-runner-0.5.1`). Run a Flux workflow with: +To use Metafacture as a command-line tool, download the latest metafix-runner from our [releases page](https://github.com/metafacture/metafacture-fix/releases). Extract the downloaded archive and change into the newly created directory (e.g. `cd metafacture-runner-0.6.1`). Run a Flux workflow with: `$ ./bin/metafix-runner /path/to/your.flux` on Unix/Linux/Mac or `$ ./bin/metafix-runner.bat /path/to/your.flux` on Windows. @@ -50,7 +50,7 @@ dependencies { } ``` -To use Fix you would declare `metafix` instead of `metafacture-io` in the example above. Note that `metafix` is not published to maven central but only to [github releases](https://github.com/metafacture/metafacture-fix/releases). +To use Fix you would declare `metafix` instead of `metafacture-io` as in the example above. Note that `metafix` is not published to maven central but only to [github releases](https://github.com/metafacture/metafacture-fix/releases). Occasionally, we publish snapshot builds on [Sonatype OSS Repository](https://oss.sonatype.org/index.html#nexus-search;gav~org.metafacture~~~~~kw,versionexpand). The version number is derived from the branch name. Snapshot builds from the master branch always have the version `master-SNAPSHOT`. We also provide sometimes pre releases as github packages. diff --git a/Home.md b/Home.md deleted file mode 100644 index fdbddbf..0000000 --- a/Home.md +++ /dev/null @@ -1,56 +0,0 @@ -![logo](https://github.com/culturegraph/metafacture-core/wiki/img/metafacture_small.png) - -# Metafacture Documentation - -Metafacture is a toolkit for processing semi-structured data with a focus on library metadata. It provides a versatile set of tools for reading, writing and transforming data. Metafacture can be used as a stand-alone application via CLI or as a Java library in other applications. There is also a playground where you can test workflows. - -The name Metafacture is a portmanteau of the words metadata and manufacture. - -Metafacture comprises three main parts: Framework, Flux and the Transformation-Module Fix. It can be extended with modules. - -__________________ - -## Using Metafacture via playground or CLI - -> [!NOTE] -> No JAVA-Code is necessary!!! - -While working with the playground or the command line you only need [Flux](#flux) and the transformation module [Fix](#fix). -Have a look here for [Getting started](/Getting-Started.md). - -## Framework for Java integration/development - -If you plan to use Metafacture as a Java library or if you wish to add commands to Flux. You should get familar with the [Framework](/Home.md#framework). - -__________________ - -## FLUX - -Flux is a scripting language to easily build and run processing pipelines. No Java programming is necessary - it's used as a command line. To use Flux you may download the binary distribution of Metafacture. - -For more information on how to use Flux, see the [Flux User Guide](/Flux-User-Guide.md). - -See [here for all available FLUX-Commands](/flux-commands.md). - -## FIX - -Metafix is a domain specific language for metadata transformation based on Catmandu FIX. The FIX object performing the transformation is used as part of a processing pipeline. - -If you are using **Metafacture with CLI or Playground** and therefore the Flux scripting language to build and run pipelines, use the `fix` command in your FLUX-Pipeline. - -If you are using **Metafacture as a Java library**, just create a Metafix object and add it to your pipeline (see also the [Framework User Guide](#framework)). - -The transformation itself is declared in a fix-object which can be a file. For more information on how to declare transformations see [Metafix User Guide](/Fix-User-Guide.md). - -> [!NOTE] -> PS: There is also the transformation modul MORPH. Have a look at[ the old documentation](https://github.com/metafacture/metafacture-core/wiki/Metamorph-User-Guide) and the german cookbook by [Swissbib](https://swissbib.gitlab.io/metamorph-doku/). - -## Framework - -> [!NOTE] ->Relevant for developers - -The framework includes the interfaces and abstract classes which form the foundation of the data processing pipelines. This part of Metafacture is only relevant for you if you plan to use Metafacture as a Java library or if you wish to add pipe elements to Flux. - -For more information see the [Framework User Guide](/Framework-User-Guide.md). - diff --git a/README.md b/README.md index aab1ed0..590572d 100644 --- a/README.md +++ b/README.md @@ -1,66 +1,57 @@ -# metafacture-documentation +![logo](https://github.com/culturegraph/metafacture-core/wiki/img/metafacture_small.png) -The central place for documentation about Metafacture. +# Metafacture Documentation + +Metafacture is a toolkit for processing semi-structured data with a focus on library metadata. It provides a versatile set of tools for reading, writing and transforming data. Metafacture can be used as a stand-alone application via CLI or as a Java library in other applications. There is also a playground where you can test workflows. + +The name Metafacture is a portmanteau of the words metadata and manufacture. + +Metafacture comprises three main parts: Framework, Flux and the Transformation-Module Fix. It can be extended with modules. Our goal with this repo is to collaboratively create comprehensive documentation on Metafacture in the [issue tracker](https://github.com/culturegraph/metafacture-documentation/issues?q=). Feel free to open issues not only for bugs or enhancements, but also questions about Metafacture usage, or to share your experiences. We hope that over time, in that way we can create useful tutorials, how-tos, and collect good practices for using Metafacture. +__________________ + +## Using Metafacture via playground or CLI + +> [!NOTE] +> No JAVA-Code is necessary!!! + +While working with the playground or the command line you only need [Flux](#flux) and the transformation module [Fix](#fix). +Have a look here for [Getting started](/Getting-Started.md). + +## Framework for Java integration/development + +If you plan to use Metafacture as a Java library or if you wish to add commands to Flux. You should get familar with the [Framework](/Home.md#framework). + +__________________ + +## FLUX + +Flux is a scripting language to easily build and run processing pipelines. No Java programming is necessary - it's used as a command line. To use Flux you may download the binary distribution of Metafacture. + +For more information on how to use Flux, see the [Flux User Guide](/Flux-User-Guide.md). + +See [here for all available FLUX-Commands](/flux-commands.md). + +## FIX + +Metafix is a domain specific language for metadata transformation based on Catmandu FIX. The FIX object performing the transformation is used as part of a processing pipeline. + +If you are using **Metafacture with CLI or Playground** and therefore the Flux scripting language to build and run pipelines, use the `fix` command in your FLUX-Pipeline. + +If you are using **Metafacture as a Java library**, just create a Metafix object and add it to your pipeline (see also the [Framework User Guide](#framework)). + +The transformation itself is declared in a fix-object which can be a file. For more information on how to declare transformations see [Metafix User Guide](/Fix-User-Guide.md). + +> [!NOTE] +> PS: There is also the transformation modul MORPH. Have a look at[ the old documentation](https://github.com/metafacture/metafacture-core/wiki/Metamorph-User-Guide) and the german cookbook by [Swissbib](https://swissbib.gitlab.io/metamorph-doku/). + +## Framework + +> [!NOTE] +>Relevant for developers + +The framework includes the interfaces and abstract classes which form the foundation of the data processing pipelines. This part of Metafacture is only relevant for you if you plan to use Metafacture as a Java library or if you wish to add pipe elements to Flux. + +For more information see the [Framework User Guide](/Framework-User-Guide.md). -Here are some links to existing documentation: - -- [metafacture-core README](https://github.com/culturegraph/metafacture-core/blob/master/README.md) -- [metafacture-core Wiki](https://github.com/culturegraph/metafacture-core/wiki) -- [metafacture-core flux-commands](https://github.com/metafacture/metafacture-documentation/blob/master/flux-commands.md) -- [metafacture-core fix functions and cookbook](https://github.com/metafacture/metafacture-fix#functions-and-cookbook) -- [metafacture-examples](https://github.com/culturegraph/metafacture-examples) -- [metafacture-java-examples](https://github.com/hbz/metafacture-java-examples) -- [metafacture-flux-examples](https://github.com/hbz/metafacture-flux-examples) -- [Introduction to Metafacture (workshop slides)](http://slides.lobid.org/metafacture-2020) -- [Metamorph Book (work in progress, very early version)](http://b3e.net/metamorph-book/latest/) -- [Metamorph-Dokumentation (entstanden im Projekt linked-swissbib)](https://swissbib.gitlab.io/metamorph-doku) - -## how to change flux-commands.md - -The entries in flux-commands.md describe the usage of commands used by flux. -flux-commands.md is fully automatically generated. To make this happen one has to -fill in the proper annotations in the correponding java classes. E.g. - -``` -reset-object-batch ------------------- -- description: Resets the downstream modules every batch-size objects -- options: batchsize (int) -- signature: Object -> Object -- java class: org.metafacture.flowcontrol.ObjectBatchResetter -``` - -is generated by reading following annotations in [ObjectBatchResetter.java](https://github.com/metafacture/metafacture-core/blob/511b4af8b993c85a33d6a18322258a195684d133/metafacture-flowcontrol/src/main/java/org/metafacture/flowcontrol/ObjectBatchResetter.java): - -``` -@Description("Resets the downstream modules every batch-size objects") -@FluxCommand("reset-object-batch") -@In(Object.class) -@Out(Object.class) -``` -The description of "options" is produced from all "public setter-methods", in this case: -``` - public void setBatchSize(final int batchSize) { ... -``` -The option's name is produced by cutting away the "set" from the methods name, leaving -"BatchSize" which is then lowercased. The parameter of this option is generated from the -parameter type of the method - here an "int"eger. - -## how to publish flux-commands.md - -If you have updated some of these annotations, say "description", and these changes are -merged into the master branch, generate a new flux-commands.md like this: - -Go to metafacture-core, checkout master and build a distribution and start flux.sh: -```bash -$ ./gradlew installDist -$ cd ./metafacture-runner/build/install/metafacture-core/ -$ flux.sh > flux-commands.md -``` - -Open the generated flux-commands.md and remove some boilerplate at the beginning of the -file manually. Save it, copy it here, commit and push. - -The [publishing process will be automated with an github action](https://github.com/metafacture/metafacture-core/issues/368). diff --git a/flux-commands.md b/flux-commands.md index 1ecc489..f681249 100644 --- a/flux-commands.md +++ b/flux-commands.md @@ -115,7 +115,7 @@ decode-html decode-json ----------- - description: Decodes JSON to metadata events. The 'recordPath' option can be used to set a JsonPath to extract a path as JSON - or to split the data into multiple JSON documents. -- options: recordid (String), recordcount (int), booleanmarker (String), arraymarker (String), arrayname (String), recordpath (String), allowcomments (boolean), numbermarker (String) +- options: recordid (String), recordcount (int), booleanmarker (String), arraymarker (String), arrayname (String), recordpath (String), numbermarker (String), allowcomments (boolean) - signature: String -> StreamReceiver - java class: org.metafacture.json.JsonDecoder @@ -299,6 +299,13 @@ filter-triples - signature: Triple -> Triple - java class: org.metafacture.triples.TripleFilter +fix (only available with Metafix) +--- +- description: Applies a fix transformation to the event stream. +- options: repeatedfieldstoentities (boolean), strictness [PROCESS, RECORD, EXPRESSION], entitymembername (String), strictnesshandlesprocessexceptions (boolean) +- signature: StreamReceiver -> StreamReceiver +- java class: org.metafacture.metafix.Metafix + flatten ------- - description: flattens out entities in a stream by introducing dots in literal names @@ -369,6 +376,20 @@ lines-to-records - signature: String -> String - java class: org.metafacture.strings.LineRecorder +list-fix-paths (only available with Metafix) +-------------- +- description: Lists all paths found in the input records. These paths can be used in a Fix to address fields. Options: `count` (output occurence frequency of each path, sorted by highest frequency first; default: `true`), `template` (for formatting the internal triple structure; default: `${o} | ${s}` if count is true, else `${s}`)`index` (output individual repeated subfields and array elements with index numbers instead of '*'; default: `false`) +- options: template (String), count (boolean), index (boolean) +- signature: StreamReceiver -> String +- java class: org.metafacture.metafix.ListFixPaths + +list-fix-values (only available with Metafix) +--------------- +- description: Lists all values found for the given path. The paths can be found using fix-list-paths. Options: `count` (output occurence frequency of each value, sorted by highest frequency first; default: `true`)`template` (for formatting the internal triple structure; default: `${o} | ${s}` if count is true, else `${s}`) +- options: template (String), count (boolean) +- signature: StreamReceiver -> String +- java class: org.metafacture.metafix.ListFixValues + literal-to-object ----------------- - description: Emits literal values as objects. @@ -466,7 +487,7 @@ object-to-literal open-file --------- - description: Opens a file. -- options: decompressconcatenated (boolean), encoding (String), compression [NONE, AUTO, BZIP2, GZIP, PACK200, XZ] +- options: decompressconcatenated (boolean), encoding (String), compression (String) - signature: String -> Reader - java class: org.metafacture.io.FileOpener @@ -506,7 +527,7 @@ pass-through print ----- - description: Writes objects to stdout -- options: footer (String), header (String), encoding (String), compression [NONE, AUTO, BZIP2, GZIP, PACK200, XZ], separator (String) +- options: footer (String), header (String), encoding (String), compression (String), separator (String) - signature: Object -> - java class: org.metafacture.io.ObjectStdoutWriter @@ -618,7 +639,7 @@ stream-tee stream-to-triples ----------------- - description: Emits the literals which are received as triples such that the name and value become the predicate and the object of the triple. The record id containing the literal becomes the subject. If 'redirect' is true, the value of the subject is determined by using either the value of a literal named '_id', or for individual literals by prefixing their name with '{to:ID}'. Set 'recordPredicate' to encode a complete record in one triple. The value of 'recordPredicate' is used as the predicate of the triple. If 'recordPredicate' is set, no {to:ID}NAME-style redirects are possible. -- options: redirect (boolean), recordpredicate (String) +- options: recordpredicate (String), redirect (boolean) - signature: StreamReceiver -> Triple - java class: org.metafacture.triples.StreamToTriples @@ -716,4 +737,3 @@ xml-tee - description: Sends an object to more than one receiver. - signature: XmlReceiver -> XmlReceiver - java class: org.metafacture.plumbing.XmlTee - From d74448956b4f6b8ec60776471537e00e10b5fa4c Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Wed, 30 Aug 2023 13:11:36 +0200 Subject: [PATCH 61/69] Update Flux-User-Guide.md Co-authored-by: Pascal Christoph --- Flux-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Flux-User-Guide.md b/Flux-User-Guide.md index d8e2f59..d332bc6 100644 --- a/Flux-User-Guide.md +++ b/Flux-User-Guide.md @@ -107,7 +107,7 @@ Paths are always relative to the directory within which the flux command is exec Flux supports single line C/Java-style comments: `//comment`. -## Getting Help and Inspiration (TODO: Ersetzen.) +## Overview of the commands and some examples 1. Have a look at the [List of available FLUX commands](https://github.com/metafacture/metafacture-documentation/blob/master/flux-commands.md) or execute the flux without arguments to get a short help text along with a list of all registered commands. This is the list of FLUX commands mentioned already above. 2. There are several example flux files along with sample data in the folder `examples/`: https://github.com/metafacture/metafacture-core/tree/master/metafacture-runner/src/main/dist/examples From f4155266e5892060e0645e5c8f1f2bc380224b5c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tobias=20B=C3=BClte?= Date: Wed, 30 Aug 2023 13:18:07 +0200 Subject: [PATCH 62/69] Update Fix-User-Guide.md --- Fix-User-Guide.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 9731477..b61e430 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -125,8 +125,8 @@ For the fields with deeper structure you add a dot ‘.’. The path for element Sometimes an element can have multiple instances. Different data models solve this possibility differently. In XML records element repetition is possible and (partly) allowed in many schemas. Repeatable elements also exist in JSON and YAML but are unusual. -To point to a specific element you state the index number. To adress the value `repeatedField2` the path would be `f.2`` since the repeated field is handled as a list. -Similarly you address the `listElement3` of the array/list by `g[].3`. The brackets are an array indicator created by the flux command decode-yaml. It helps to interpret an element as array even if the list has only on value. +To point to a specific element you state the index number. To adress the value `repeatedField2` the path would be `f.2` since the repeated field is handled as a list. +Similarly you address the `listElement3` of the array/list by `g[].3`. The brackets are an array indicator created by the flux command `decode-yaml`(or by `decode-json`). It helps to interpret an repeatable element as an array even if the list has only one value. When working with nested structures and combinations of arrays and objects the path is a combination of element names, dots and index numbers. From 6dcb20d38cdb88cfa6bb52ae5e4f6f5500cda731 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tobias=20B=C3=BClte?= Date: Wed, 30 Aug 2023 13:19:41 +0200 Subject: [PATCH 63/69] Update Fix-User-Guide.md --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index b61e430..f75d537 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -23,7 +23,7 @@ Flux-Example: - - address the `fix`-module - - you can add variables - - The Fix transformation can be part of the FLUX `|fix("retain(`245??`)")` - usually useful for short fixes -- - or it can be separated in an external file, that is called in the FLUX-Process as in the code-snipped above +- - or it can be separated in an external file, that is called in the FLUX-Process as in the code snippet above - when using it in a Java process, just add the library to your process ## Record-based and metadata manipulating approach From 106c2e13bc7d3f582c8100e5a272c34649761333 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tobias=20B=C3=BClte?= Date: Wed, 30 Aug 2023 13:20:21 +0200 Subject: [PATCH 64/69] Update Fix-User-Guide.md --- Fix-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index f75d537..287527e 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -28,7 +28,7 @@ Flux-Example: ## Record-based and metadata manipulating approach While Metafature processes the data as a stream, the fix module does not, it buffers the incoming stream to distinct records. -Thus you can manipulate all metadata-elements of a record at once and don't need to think about the order of the incoming stream - which was a really big hassle in the stream-based MORPH. +Thus you can manipulate all metadata elements of a record at once and don't need to think about the order of the incoming stream - which was a really big hassle in the stream-based MORPH. The incoming record then can be manipulated, fields can be changed, removed or added. This also differs from the approach in the other Transformation Module MORPH where you construct a new record and a new data stream. With FIX you change stuff in the record and "only" change the data stream in Metafacture. From 595eb8bf1c46cf38da3de4ebc1ed759caa5f3925 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Thu, 31 Aug 2023 09:16:32 +0200 Subject: [PATCH 65/69] Update Flux-User-Guide.md Co-authored-by: Fabian Steeg --- Flux-User-Guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Flux-User-Guide.md b/Flux-User-Guide.md index d332bc6..eb13f3f 100644 --- a/Flux-User-Guide.md +++ b/Flux-User-Guide.md @@ -99,7 +99,7 @@ To lookup the signatures, again: execute Flux without arguments or see: [[Metafi Variables are always Strings and can be concatenated with the `+` operator. Escape sequences follow the Java String conventions: `\n`=line break, `\t`=tab, `\\`=\, `\u0024`=unicode character, etc. The `default` keyword tells Flux to assign the respective value _only_ if the variable has -not yet been set on the command line. Without `default`, previous assignments will be overwritten. +not yet been set on the command line. Without `default`, previous assignments, e.g. from command line variables, will be overridden by the explicitly assigned value. Paths are always relative to the directory within which the flux command is executed. To address files relative to the location of the executed flux file, use the predefined `FLUX_DIR` variable. From 587fbd3d0871419a0e190df855a06e82545acdff Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Thu, 31 Aug 2023 15:37:49 +0200 Subject: [PATCH 66/69] Update README.md --- README.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 590572d..f31190c 100644 --- a/README.md +++ b/README.md @@ -6,11 +6,17 @@ Metafacture is a toolkit for processing semi-structured data with a focus on lib The name Metafacture is a portmanteau of the words metadata and manufacture. -Metafacture comprises three main parts: Framework, Flux and the Transformation-Module Fix. It can be extended with modules. +Metafacture comprises three main parts: **Framework**, **Flux** and one of the **Transformation-Modules Fix and Morph**. It can be extended with modules. + +> [!NOTE] +> With regard on to the Transformation-Modules this documentation focusses on Fix instead of MORPH. If you want to find out more about MORPH. Have a look at [the old documentation](https://github.com/metafacture/metafacture-core/wiki/Metamorph-User-Guide) and the german cookbook by [Swissbib](https://swissbib.gitlab.io/metamorph-doku/). + Our goal with this repo is to collaboratively create comprehensive documentation on Metafacture in the [issue tracker](https://github.com/culturegraph/metafacture-documentation/issues?q=). Feel free to open issues not only for bugs or enhancements, but also questions about Metafacture usage, or to share your experiences. We hope that over time, in that way we can create useful tutorials, how-tos, and collect good practices for using Metafacture. __________________ +Deciding which parts are relevant to you depends on the way you are using Metafacture: + ## Using Metafacture via playground or CLI > [!NOTE] From f3c248e295acdd969b0a66fd9bc47529b2310d53 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Tue, 5 Sep 2023 11:01:39 +0200 Subject: [PATCH 67/69] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index f31190c..fca77ec 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ Metafacture is a toolkit for processing semi-structured data with a focus on library metadata. It provides a versatile set of tools for reading, writing and transforming data. Metafacture can be used as a stand-alone application via CLI or as a Java library in other applications. There is also a playground where you can test workflows. -The name Metafacture is a portmanteau of the words metadata and manufacture. +The central place for the documentation about Metafacture. Metafacture comprises three main parts: **Framework**, **Flux** and one of the **Transformation-Modules Fix and Morph**. It can be extended with modules. From ff688104a52db6aa6523357789d2d1234abaaec9 Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Tue, 5 Sep 2023 11:10:17 +0200 Subject: [PATCH 68/69] Update Getting-Started.md --- Getting-Started.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Getting-Started.md b/Getting-Started.md index c8aeb18..e473475 100644 --- a/Getting-Started.md +++ b/Getting-Started.md @@ -5,7 +5,7 @@ ## Playground -The easiest way to get started with Metafacture is the Playground. Take a look at the [first example](https://metafacture.org/playground/?flux=PG_DATA%0A%7Cas-lines%0A%7Cdecode-formeta%0A%7Cfix%0A%7Cencode-xml%28rootTag%3D%22collection%22%29%0A%7Cprint%0A%3B&fix=move_field%28_id%2C+id%29%0Amove_field%28a%2C+title%29%0Apaste%28author%2C+b.v%2C+b.n%2C+%27~aus%27%2C+c%29%0Aretain%28id%2C+title%2C+author%29&data=1%7Ba%3A+Faust%2C+b+%7Bn%3A+Goethe%2C+v%3A+JW%7D%2C+c%3A+Weimar%7D%0A2%7Ba%3A+R%C3%A4uber%2C+b+%7Bn%3A+Schiller%2C+v%3A+F%7D%2C+c%3A+Weimar%7D&active-editor=fix) and run it by pressing the !["Process"](img/process.png) button. Check out the other examples (first button, !["Load Examples"](img/load-exmples.png)) for different input sources, transformations, and output formats. +The easiest way to get started with Metafacture is the Playground. Take a look at the [first example](https://metafacture.org/playground/?flux=PG_DATA%0A%7Cas-lines%0A%7Cdecode-formeta%0A%7Cfix%0A%7Cencode-xml%28rootTag%3D%22collection%22%29%0A%7Cprint%0A%3B&fix=move_field%28_id%2C+id%29%0Amove_field%28a%2C+title%29%0Apaste%28author%2C+b.v%2C+b.n%2C+%27~aus%27%2C+c%29%0Aretain%28id%2C+title%2C+author%29&data=1%7Ba%3A+Faust%2C+b+%7Bn%3A+Goethe%2C+v%3A+JW%7D%2C+c%3A+Weimar%7D%0A2%7Ba%3A+R%C3%A4uber%2C+b+%7Bn%3A+Schiller%2C+v%3A+F%7D%2C+c%3A+Weimar%7D&active-editor=fix) and run it by pressing the !["Process"](https://metafacture.org/img/process.png) button. Check out the other examples (first button, !["Load Examples"](https://metafacture.org/img/load-exmples.png)) for different input sources, transformations, and output formats. For commands available in the Flux, see [the Flux commands documentation](/flux-commands.md). @@ -20,7 +20,7 @@ To use Metafacture as a command-line tool, download the latest metafix-runner fr `$ ./bin/metafix-runner /path/to/your.flux` on Unix/Linux/Mac or `$ ./bin/metafix-runner.bat /path/to/your.flux` on Windows. -To get started, you can export a workflow from the Playground (last button, !["Export Workflow"](img/export.png)). +To get started, you can export a workflow from the Playground (last button, !["Export Workflow"](https://metafacture.org/img/export.png)). To set up IDE support for editing your Flux and Fix files, see [the IDE extensions page](https://metafacture.org/ide-extensions/index.html). From cfe5862ba9495f9131a01447f4df70949acb927e Mon Sep 17 00:00:00 2001 From: TobiasNx <61879957+TobiasNx@users.noreply.github.com> Date: Tue, 5 Sep 2023 11:20:46 +0200 Subject: [PATCH 69/69] Update Fix-User-Guide.md --- Fix-User-Guide.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Fix-User-Guide.md b/Fix-User-Guide.md index 287527e..5e2c33a 100644 --- a/Fix-User-Guide.md +++ b/Fix-User-Guide.md @@ -27,7 +27,7 @@ Flux-Example: - when using it in a Java process, just add the library to your process ## Record-based and metadata manipulating approach -While Metafature processes the data as a stream, the fix module does not, it buffers the incoming stream to distinct records. +While Metafature processes the data as a stream, the `fix` module does not. It buffers the incoming stream to distinct records. Thus you can manipulate all metadata elements of a record at once and don't need to think about the order of the incoming stream - which was a really big hassle in the stream-based MORPH. The incoming record then can be manipulated, fields can be changed, removed or added. This also differs from the approach in the other Transformation Module MORPH where you construct a new record and a new data stream. With FIX you change stuff in the record and "only" change the data stream in Metafacture. @@ -71,7 +71,7 @@ end **Functions** are used to add, change, remove or otherwise manipulate elements. -**Conditionals** are used to control the processing of function so that they are not process with every workflow but only under certain conditions. +**Conditionals** are used to control the processing of fix functions. The included fix functions are not process with every workflow but only under certain conditions. **Selectors** can be used to filter the records you want. @@ -248,4 +248,4 @@ do once("setup") end ``` -For performance reasons it is useful to integrate macros and maps that are used often in a `do once` bind. \ No newline at end of file +For performance reasons it is useful to integrate macros and maps that are used often in a `do once` bind.