IDAP 200 Gbps with ATLAS PHYSLITE

Targeting analysis at 200 Gbps with ATLAS PHYSLITE. This repository is very much a work in progress.

ATLAS does not have released OpenData, so there isn't an AGC we can copy and try to run. As a result, this repository's main purpose is as a facilities test:

Run from PHYSLITE
Load 200 Gbps off of the PHYSLITE samples
Push all that data downstream to DASK (or similar) workers.

We have a losely tracked set of lessons learned.

Description of files

materialize_branches.ipynb: read list of branches, distributable with Dask (use for benchmarking)

Usage

When run on the UChicago AF Jupyter Notebook no package installs are required.

There is a requirements.txt which should allow this to be run on a bare-bones machine (modulo location of files, etc.).

If you are going to use the servicex version, you have to pin dask_awkward==2024.2.0. The future versions have a bug which hasn't been fixed yet.

Input file details

The folder input_files contains the list of input containers / files and related metadata plus scripts to produce these.

In total:

number of files: 219,029
size: 191.073 TB
number of events: 23,347,787,104

with additional files:

input_files/find_containers.py: query rucio for a list of containers, given a list of (hardcoded) DSIDs
input_files/container_list.txt: list of containers to run over
input_files/produce_container_metadata.py: query metadata for containers: number of files / events, size
input_files/container_metadata.json: output of input_files/produce_container_metadata.py with container metadata

input_files/get_file_list.py: for a given dataset creates a txt file listing file access paths that include appropriate xcache. The same kind of output can be obtained by doing:

export SITE_NAME=AF_200
rucio list-file-replicas mc20_13TeV:mc20_13TeV.364126.Sherpa_221_NNPDF30NNLO_Zee_MAXHTPTV500_1000.deriv.DAOD_PHYSLITE.e5299_s3681_r13145_p6026 --protocol root  --pfns --rses MWT2_UC_LOCALGROUPDISK

input_files/containers_to_files.py: process the list of containers into a list of files per container with hardcoded xcache instances, writes to input_files/file_lists/*.

Branch list determination

Branches to be read are determined with a 2018 data file.

input_files/size_per_branch.ipynb: produce breakdown of branch sizes for given file
input_files/branch_sizes.json: output of notebook above

Acknowledgements

This work was supported by the U.S. National Science Foundation (NSF) cooperative agreements OAC-1836650 and PHY-2323298 (IRIS-HEP).

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
input_files		input_files
reports		reports
servicex		servicex
utils		utils
xcache_stress_test		xcache_stress_test
.gitignore		.gitignore
README.md		README.md
materialize_branches.ipynb		materialize_branches.ipynb
multi-user-exercise.ipynb		multi-user-exercise.ipynb
requirements.txt		requirements.txt
servicex_query_cache.json		servicex_query_cache.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IDAP 200 Gbps with ATLAS PHYSLITE

Description of files

Usage

Input file details

Branch list determination

Acknowledgements

About

Releases 1

Packages

Contributors 7

Languages

iris-hep/idap-200gbps-atlas

Folders and files

Latest commit

History

Repository files navigation

IDAP 200 Gbps with ATLAS PHYSLITE

Description of files

Usage

Input file details

Branch list determination

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 7

Languages

Packages