Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation on I/O #267

Merged
merged 6 commits into from
Dec 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 26 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,40 @@
# k4FWCore (key4hep FrameWork Core)

k4FWCore is a Gaudi package that provides the PodioDataService, which allows to
use podio-based event data models like EDM4hep in Gaudi workflows.
k4FWCore is a Gaudi package that provides the IOSvc, which allows to
use EDM4hep in Gaudi workflows.

k4FWCore also provides the `k4run` script used to run Gaudi steering files.
k4FWCore also provides the `k4run` script used to run Gaudi steering files. See the [documentation](doc/k4run-args.md) for more information.

## Components

### Basic I/O

#### k4DataSvc
| Current | Legacy | Description |
|---------|--------|-|
| IOSvc | k4DataSvc | Service handling the PODIO types and collections |
| Reader | PodioInput | Algorithm to read data from input files on disk. |
| Writer | PodioOutput | Algorithm to write data to an output file on disk. |
| MetadataSvc | MetaDataHandle | Service/Handle handling user defined metadata |

Component wrapping the PodioDataService to handle PODIO types and collections.
See the [documentation](doc/PodioInputOutput.md) for more information.

#### PodioInput
### Auxiliary

Algorithm to read data from one or multiple input file(s) on disk.
### Collection Merger

#### PodioOutput
Algorithm merging multiple collections of the same type into a single collection.

Algorithm to write data to an output file on disk.
### EventHeaderCreator

Algorithm creating a new `edm4hep::EventHeaderCollection` data object.

### EventCounter

Algorithm counting processed events and printing heart-bit.

### UniqueIDGenSvc

Service generating unique, reproducible numbers to be used for seeding RNG used by the algorithms. See the [documentation](doc/uniqueIDGen.md) for more information.

## k4run
```
Expand Down Expand Up @@ -57,6 +72,8 @@ print(my_opts[0].foo)

* Gaudi

* EDM4HEP

## Installation and downstream usage.

k4FWCore is a CMake project. After setting up the dependencies (use for example `source /cvmfs/sw.hsf.org/key4hep/setup.sh`)
Expand Down
136 changes: 136 additions & 0 deletions doc/LegacyPodioInputOutput.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
<!--
Copyright (c) 2014-2024 Key4hep-Project.

This file is part of Key4hep.
See https://key4hep.github.io/key4hep-doc/ for further info.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Legacy reading and writing EDM4hep files in Gaudi with the 4DataSvc

:::{caution}
`k4DataSvc` is a legacy service previously used in K4FWCore for reading and writing data in EDM4hep or other data models based on PODIO.

The currently used service is `IOSvc`, which offers improved streamlined functionality and better support for modern workflows. For detailed documentation on `IOSvc`, refer to [this documentation](PodioInputOutput.md).
:::

This page will describe the usage of legacy [k4FWCore](https://github.com/key4hep/k4FWCore)
facilities to read and write EDM4hep. This page also assumes a certain
familiarity with Gaudi, i.e. most of the snippets just show a minimal
configuration part, and not a complete runnable example.

## The `k4DataSvc`

Whenever you want to work with EDM4hep in the Gaudi based framework of Key4hep,
you will need to use the `k4DataSvc` as *EventDataSvc*. You can instantiate and
configure this service like the following

```python
from Gaudi.Configuration import *
from Configurables import k4DataSvc

evtSvc = k4DataSvc("EventDataSvc")
```

**It is important that the name is `EventDataSvc` in this case, as otherwise
this is an assumption from Gaudi.** Once you have the `k4DataSvc` instantiated,
you still have to make the `ApplicationMgr` aware of it, by making sure that the
`evtSvc` is in the list of the *external services* (`ExtSvc`):

```python
from Configurables import ApplicationMgr
ApplicationMgr(
# other args
ExtSvc = [evtSvc]
)
```

## Reading events

To read events you will need to use the `PodioInput` algorithm in addition to
the [`k4DataSvc`](#the-k4datasvc). Currently, you will need to pass the input
file to the `k4DataSvc` via the `input` option but pass the collections that you
want to read to the `PodioInput`. We are working on making this (discussion
happens in this [issue](https://github.com/key4hep/k4FWCore/issues/105)). The
parts of your options file related to reading EDM4hep files will look something
like this

```python
from Configurables import PodioInput, k4DataSvc

evtSvc = k4DataSvc("EventDataSvc")
evtSvc.input = "/path/to/your/input-file.root"

podioInput = PodioInput()
```

It is possible to change the input file from the command line via
```bash
k4run <your-options-file> --EventDataSvc.input=<input-file>
```

By default the `PodioInput` will read all collections that are available from
the input file. It is possible to limit the collections that should become
available via the `collections` option

```python
podioInput.collections = [
# List of collection names that should be made available
]
```

## Writing events

To write events you will need to use the `PodioOutput` algorithm in addition to
the [`k4DataSvc`](#the-k4datasvc):

```python
from Configurables import PodioOutput

podioOutput = PodioOutput("PodioOutput", filename="my_output.root")
```

By default this will write the complete event contents to the output file.

### Writing only a subset of collections

Sometimes it is desirable to limit the collections to a subset of all available
collections from the EventStore. The `PodioOutput` allows to do this via the
`outputCommands` option that takes a list of `keep` or `drop` commands. Each
command must consist of the `keep`/`drop` command and a target. The target is a
collection name that may include the `?` or `*` wildcard patterns. This might
look like the following

```python
podioOutput.outputCommands = ["keep *"]
```

which will keep everything (the default), while

```python
podioOutput.outputCommands = ["drop *"]
```

will simply drop all collections and effectively write an empty file (apart from
some metadata). A common pattern is to `"drop *"` and then selectively adding
`keep` collections to keep, e.g. to only keep the highest level MC and reco
information:

```python
podioOutput.outputCommands = [
"drop *",
"keep MCParticlesSkimmed",
"keep PandoraPFOs",
"keep RecoMCTruthLink",
]
```
Loading
Loading