diff --git a/content/en/users/data/management/_index.md b/content/en/users/data/management/_index.md index d28e5f1023..0c06319bba 100644 --- a/content/en/users/data/management/_index.md +++ b/content/en/users/data/management/_index.md @@ -26,37 +26,29 @@ and can be accessed from most [compute services](../../compute). ## Generic data management -These higher-level data management services are available to researchers: +This higher-level data management service is available to researchers: -- [EGI Rucio](rucio) is tailored to medium/big scientific collaborations, allowing - users to organise, manage, and access their data at scale. Data can be distributed - across heterogeneous data centers at widely distributed locations. - [EGI DataHub](datahub) is a high-performance data management solution that offers unified data access across multiple types of underlying storage, allowing users to share, collaborate and easily perform computations on the stored data. ## Specialized data management -The following specialized data management services are also available: +The following specialized data management service is also available: - [EGI Data Transfer](data-transfer) is a low-level service to move data from one [Grid](../storage/grid-storage) or [Object](../storage/object-storage) storage - to another. It is used internally by [Rucio](rucio) to schedule transfers based - on the data policies defined by the users. -- [openRDM](open-rdm) is a combined [FAIR](https://en.wikipedia.org/wiki/FAIR_data) - data management platform, Electronic Laboratory Notebook (ELN) and Inventory Management System - allowing a complete overview of workflows and information, from initial data generation - to data analysis and publication. + to another. -## Use-cases for storing and managing research data +## Use-cases for storing research data Depending on the type of the employed compute services and the use-cases being addressed, users might need to choose different data service to store, access, and manage data. -| User | Data storage | Data management (optional) | -| -------------- | -------------------------------------------------------- | -------------------------- | -| **Cloud user** | Block and Object storage | DataHub | -| **HTC user** | Grid storage | Rucio | -| **HPC user** | High-performance parallel file systems or Object storage | DataHub or Rucio | +| User | Data storage | +| -------------- | -------------------------------------------------------- | +| **Cloud user** | Block and Object storage | +| **HTC user** | Grid storage | +| **HPC user** | High-performance parallel file systems or Object storage | The following sections offer detailed descriptions for each data management service. diff --git a/content/en/users/data/management/open-rdm/_index.md b/content/en/users/data/management/open-rdm/_index.md deleted file mode 100644 index 2f27511082..0000000000 --- a/content/en/users/data/management/open-rdm/_index.md +++ /dev/null @@ -1,33 +0,0 @@ ---- -title: openRDM -linkTitle: Research Data Management -type: docs -weight: 50 -weight: -description: > - Organise data in research projects with openRDM ---- - -While most other [data management services](..) are available in the EGI infrastructure, -there are specialized services in the EGI portfolio that are offered for in-house -installations by research communities, with support for customization and -configuration from EGI. - -## What is it? - -The [openRDM](https://marketplace.eosc-portal.eu/services/openrdm-eu) service, -an Research Data Management (RDM) tool, -offers **advanced organisation of data during ongoing research projects, -as an integrated environment with data management and digital lab notebook**. - -openRDM combines a data management platform with a digital lab notebook and a sample and -protocol management system. It enables scientists to meet the ever-increasing requirements -from funding agencies, journals, and academic institutions to publish data according to the -[FAIR](https://en.wikipedia.org/wiki/FAIR_data) data principles – according to which data -should be _Findable_, _Accessible_, _Interoperable_ and _Reusable_. - -{{% alert title="Note" color="info" %}} The openRDM service is based around the -active research data management (ARDM) platform [openBIS](https://openbis.ch/), see the -[documentation](https://openbis.ch/index.php/docs/user-documentation/) -for more details. -{{% /alert %}} diff --git a/content/en/users/data/management/rucio/_index.md b/content/en/users/data/management/rucio/_index.md deleted file mode 100644 index 29a9430bf3..0000000000 --- a/content/en/users/data/management/rucio/_index.md +++ /dev/null @@ -1,82 +0,0 @@ ---- -title: Rucio -linkTitle: Data Orchestrator -type: docs -weight: 40 -description: >- - Organise and access data at scale with Rucio ---- - -## What is it? - -Built on more than a decade of experience in LHC experiments, Rucio serves the -data needs of any modern scientific experiments. Rucio can manage large amounts -of data, countless numbers of files, heterogeneous storage systems, globally -distributed data centres, with monitoring and analytics. - -Rucio **allows management of data with expressive statements**. You to say what -you want, and Rucio will figure out the details of how to do it. For example, -three copies of my file on different continents with a backup on tape. You can -also automatically remove copies of data after a set period or once its access -popularity drops. - -While Rucio is extremely scalable, the STFC Rucio Data Management Service is -designed for smaller communities, with expected data needs up to tens of -Petabytes. The fact that the underlying Rucio infrastructure is managed by STFC, -allows communities to easily start using and/or test Rucio with little setup -cost. - -## Requirements to consider to use Rucio - -For Rucio to manage your data in this setup, Rucio will need X.509 certificate -access, or soon, through -[EGI Check-in](../../../../providers/check-in/_index.md) to: - -- Your experiments Storage Element that is X.509 capable -- Your experiments Tape Archive that is X.509 capable -- Your experiments VOMS server information to check Users credentials against - -Rucio is a system that sits on top of already established storage elements to -unify users access for data management, and retrieval. Rucio consists of a -database of the storage element's details, users and their access credential -information, access levels, the data, and its location, Rucio is not a direct -data storage solution. - -## Rucio Use Cases - -Rucio is a data management software, that integrates with your experiments -currently provisioned storage. This section will highlight some use cases that -Rucio will fill for your experiment. - -### Management of data for archive when no longer used actively - -A simple use case for Rucio is to manage data between 'hot' storage, made of -HDDs or SSD, and 'cold' storage made up of tape. When your experiment generates -data that data will be accessed much more than older data as your colleagues work -with the data. Within Rucio the data will be registered to be on the 'hot' and -'cold' storage. This ensures the integrity of the data by providing multiple -copies of the data, one on the slower tape archive, and one copy on the more -easy to access HDDs and SSDs. Then as the usage of this data declines, the data -on the 'hot' storage can be removed to make way for newer data that is more -frequently accessed. Should the archived data be requested again Rucio can stage -the data from tape back to disk making it available for users. - -### Management of data between sites for jobs - -Another useful use case for Rucio is to manage the data between different sites -within your experiment. This can provide users with better access to the data -that they want to work on. Another option is to have Rucio integrated with your -workflow management software (Panda and DIRAC both have integrated with Rucio), -so data can be moved to sites as job slots are available, streamlining the -data flow for the user. - -## Official Rucio Pages - -- [Rucio Homepage](https://rucio.cern.ch/) -- [Rucio Documentation](https://rucio.cern.ch/documentation/) - -## Multi-VO specific pages - -- [Multi-VO Rucio at RAL as a service](https://www.scd.stfc.ac.uk/Pages/SCD-STFC-Rucio-Data-Management-Service.aspx) -- [Privacy Policy](https://www.scd.stfc.ac.uk/Pages/STFC-Rucio-Privacy-Notice.aspx) -- [Acceptable Use Policy](https://www.scd.stfc.ac.uk/Pages/STFC-Rucio-Acceptable-Use-Policy.aspx) diff --git a/content/en/users/data/management/rucio/admin/_index.md b/content/en/users/data/management/rucio/admin/_index.md deleted file mode 100644 index 74b7500ef4..0000000000 --- a/content/en/users/data/management/rucio/admin/_index.md +++ /dev/null @@ -1,424 +0,0 @@ ---- -title: Administering Rucio -linkTitle: Administration -type: docs -weight: 30 -description: >- - Help Rucio admins understand and perform actions for their VO ---- - -Within Rucio there are several levels of administrators. There are the **super -admins**, which are the staff that runs Multi-VO Rucio. Then there are virtual -organisation (VO) specific admins that will look after the day-to-day operations -of their VO. Below are some of the tasks that VO admins will need to do to set -up and maintain their VO. - -## Creating Accounts, Identities, and Quotas - -To add new users within your VO, you will need to communicate with Rucio as the -VO admin. Then using the rucio-admin commands, create a new account and add -identities to the account. The account is a username with no permissions, or -authentication methods. The identities bind authentication methods and -permissions to the account. The account you want to create identities for is -input as an argument. Accounts will have different permissions and access (such -as how much data they can store on a particular -[RSE](https://rucio.cern.ch/documentation/started/concepts/rucio_storage_element/)). - -### CLI Example - -```shell -$ rucio-admin account add \ - --type USER \ - --email jdoe@email.com jdoe - -Added new account: jdoe - -$ rucio-admin identity add \ - --account jdoe \ - --type USER \ - --id userjdoe \ - --email jdoe@email.com \ - --password jdoepass - -Added new identity to account: userjdoe-jdoe - -$ rucio-admin account set-limits jdoe storagesite1 100GB - -Set account limit for account jdoe on RSE storagesite1: 100.000 GB -``` - -### Python Client Example - -```python ->>> from rucio.client import Client ->>> CLIENT = Client() ->>> CLIENT.add_account('jdoe', 'USER', 'jdoe@email.com') -True ->>> CLIENT.add_identity('jdoe', 'USER', 'jdoe@email.com') -True ->>> CLIENT.set_account_limit('jdoe', 'storagesite1', 107374182400, 'global') -True -``` - -## Creating RSE(s) - -[Rucio Storage Elements](https://rucio.cern.ch/documentation/started/concepts/rucio_storage_element/) -(RSEs) are how Rucio represents the physical storage available to your VO. As -with many aspects of Rucio there are a lot of optional attributes that can be -set for an RSE, but as a minimum a protocol for transfers need to be added -before it can be used. - -### Creating RSE(s) CLI Example - -```shell -$ rucio-admin rse add NEW_RSE - -Added new deterministic RSE: NEW_RSE - -$ rucio-admin rse add-protocol \ - --hostname test.org \ - --scheme gsiftp \ - --prefix '/filepath/rucio/' \ - --port 8443 NEW_RSE \ - --domain-json '{ - "wan": { - "read": 1, - "write": 1, - "third_party_copy": 0, - "delete": 1 - }, - "lan": { - "read": 1, - "write": 1, - "third_party_copy": 0, - "delete": 1 - } - }' -``` - -### Creating RSE(s) Python Client Example - -```python ->>> from rucio.client import Client ->>> CLIENT = Client() ->>> CLIENT.add_rse('NEW_RSE') -True ->>> CLIENT.add_protocol( - 'NEW_RSE', { - 'hostname': 'test.org', - 'scheme': 'gsiftp', - 'prefix': '/filepath/rucio/', - 'port': 8443, - 'impl': 'rucio.rse.protocols.gfalv2.Default', - 'domain': { - "wan": { - "read": 1, - "write": 1, - "third_party_copy": 0, - "delete": 1 - }, - "lan": { - "read": 1, - "write": 1, - "third_party_copy": 0, - "delete": 1 - } - } - } - ) -True -``` - -## Updating RSE Protocols - -On occasion, it may be necessary to change or update an RSE protocol. Unlike -settings (`rucio-admin rse update`) or attributes -(`rucio-admin rse set-attribute`), there isn't a direct CLI function for -changing a protocol. It would therefore be necessary to remove -(`rucio-admin rse delete-protocol`) and then add -(`rucio-admin rse add-protocol`) it again using different information. -Alternatively, the Python client has additional functionality to directly update -or swap the priority of RSE protocols. For example to update the `impl` without -changing anything else (the `data` argument is used to update the protocol, with -the other settings used to specify the protocol to change): - -```python ->>> from rucio.client import Client ->>> CLIENT = Client() ->>> CLIENT.update_protocols( - rse='NEW_RSE', - scheme='gsiftp', - data={ - 'impl': rucio.rse.protocols.gfal.Default' - }, - hostname='test.org', - port=8433 - ) -True -``` - -To swap the priority of two protocols for the third party copy operation: - -```python ->>> from rucio.client import Client ->>> CLIENT = Client() ->>> CLIENT.swap_protocols( - rse='NEW_RSE', - domain='wan', - operation='third_party_copy', - scheme_a='gsiftp', - scheme_b='root' - ) -True -``` - -It's also worth noting that when an RSE is deleted using -`rucio-admin rse delete`, the entry remains in the database. This "soft" -deletion means that attempting to add a new RSE with the same name as a deleted -RSE will fail. This is due to the RSE not having a unique name/VO combination. -In practice, it is therefore better to update a badly configured RSE rather than -attempting to delete and re-add it. However, if the latter method is preferred, -it is possible manually rename the deleted RSE in the database (as there are no -foreign key constraints on its name, just the ID and VO) so that the old name -can be re-used. - -## Basic Usage - -This section covers some of the basic Rucio functions that can be run once the -VO has accounts and RSEs set up. As with the setup, there are many options that -won't be covered here. For more information refer to either the main -documentation or the help for the function in question. - -### Daemons - -Most operations in Rucio (such as transfers, deletions, rule evaluation) require -one or more of the -[daemons](https://rucio.cern.ch/documentation/started/main_components#daemons) -to be running in order to take effect. For a multi-VO instance, these should be -running for all VOs already. However, on new VO's joining Rucio some updating of -the daemons will be necessary. - -If it seems like it is not quite right please contact the Rucio team through -[GGUS](https://ggus.eu/?mode=ticket_submit). - -## Uploading Data - -In Rucio files and their replicas are represented by Data Identifiers -([DIDs](https://rucio.cern.ch/documentation/started/concepts/file_dataset_container)), -which are composed of a scope and name. Furthermore, multiple files can be -attached to a dataset, which in turn can be attached to a container (which can -be attached to another container and so on). Datasets and containers are also -represented by DIDs. - -Scopes are always associated with a particular Rucio account, and must be added -to Rucio using an admin account. If no scope is provided when uploading, Rucio -will default to `user.`, but this still needs to have been added by an -admin. - -Once a file has been uploaded via the CLI or Python client, it can then be -attached to a dataset. It's worth noting that by default, some Rucio commands -will not list files, only datasets. - -### Uploading Data CLI Example - -Assuming the file `test.txt` exists locally: - -```shell -$ rucio-admin scope add --account root --scope user.root - -Added new scope to account: user.root-root - -$ rucio upload --rse NEW_RSE test.txt - -2020-08-14 15:28:15,059 INFO Preparing upload for file test.txt -2020-08-14 15:28:15,235 INFO Successfully added replica in Rucio catalogue at NEW_RSE -2020-08-14 15:28:15,334 INFO Successfully added replication rule at NEW_RSE -2020-08-14 15:28:15,579 INFO Trying upload with mock to NEW_RSE -2020-08-14 15:28:15,579 295 INFO Trying upload with mock to NEW_RSE -2020-08-14 15:28:15,580 INFO Successful upload of temporary file. mock://test.org:123/filepath/rucio/user/root/46/6b/test.txt.rucio.upload -2020-08-14 15:28:15,580 295 INFO Successful upload of temporary file. mock://test.org:123/filepath/rucio/user/root/46/6b/test.txt.rucio.upload -2020-08-14 15:28:15,580 INFO Successfully uploaded file test.txt -2020-08-14 15:28:15,580 295 INFO Successfully uploaded file test.txt -2020-08-14 15:28:15,583 295 DEBUG Starting new HTTPS connection (1): rucio:443 -2020-08-14 15:28:15,598 295 DEBUG https://rucio:443 "POST /traces/ HTTP/1.1" 201 7 -2020-08-14 15:28:15,662 295 DEBUG https://rucio:443 "PUT /replicas HTTP/1.1" 200 0 - -$ rucio list-dids user.root:test.txt --filter type=ALL -+--------------------+--------------+ -| SCOPE:NAME | [DID TYPE] | -|--------------------+--------------| -| user.root:test.txt | FILE | -+--------------------+--------------+ - -$ rucio add-dataset user.root:test_dataset - -Added user.root:test_dataset - -$ rucio attach user.root:test_dataset user.root:test.txt - -DIDs successfully attached to user.root:test_dataset - -$ rucio list-content user.root:test_dataset -+--------------------+--------------+ -| SCOPE:NAME | [DID TYPE] | -|--------------------+--------------| -| user.root:test.txt | FILE | -+--------------------+--------------+ -``` - -### Uploading Data Python Client Example - -Assuming the file `test.txt` exists locally: - -```python ->>> from rucio.client import Client ->>> CLIENT = Client() ->>> CLIENT.add_scope('root', 'user.root') - -True - ->>> from rucio.client.uploadclient import UploadClient ->>> UPLOAD_CLIENT = UploadClient() ->>> UPLOAD_CLIENT.upload([{'path': 'test.txt', 'rse': 'NEW_RSE'}]) - -2020-08-14 14:47:31,147 8431 DEBUG Starting new HTTPS connection (1): rucio:443 -2020-08-14 14:47:31,166 8431 DEBUG https://rucio:443 "POST /traces/ HTTP/1.1" 201 7 -2020-08-14 14:47:31,224 8431 DEBUG https://rucio:443 "PUT /replicas HTTP/1.1" 200 None -0 - ->>> list(CLIENT.list_dids('user.root', {}, type='all')) - -[u'test.txt'] ->>> CLIENT.add_dataset('user.root', 'test_dataset') - -True - ->>> CLIENT.attach_dids('user.root', 'test_dataset', [{'scope': 'user.root', 'name': 'test.txt'}]) ->>> list(CLIENT.list_content('user.root', 'test_dataset')) - -[{u'adler32': u'00000001', u'name': u'test.txt', u'bytes': 0, u'scope': u'user.root', u'type': u'FILE', u'md5': u'd41d8cd98f00b204e9800998ecf8427e'}] -``` - -## Adding Replication Rules - -Once a DID exists within the Rucio catalogue, replicas of that file, dataset or -collection are created and maintained by -[Replication rules](https://rucio.cern.ch/documentation/started/concepts/replica_management). -By uploading a file to a particular RSE, a replication rule is created for that -file, however rules can also be added for existing DIDs. As a minimum an RSE and -number of copies must be specified, but further options such as lifetime of the -rule and selecting RSEs based on user set attributes are also possible. - -### Adding Replication Rules CLI Example - -```shell - $ rucio list-rules --account root - -ID ACCOUNT SCOPE:NAME STATE[OK/REPL/STUCK] RSE_EXPRESSION COPIES EXPIRES (UTC) CREATED (UTC) --------------------------------- --------- ------------------ ---------------------- ---------------- -------- --------------- ------------------- -991f9ace7ed74cad989efde90b6a23c5 root user.root:test.txt OK[1/0/0] NEW_RSE 1 2020-08-14 15:28:15 -$ rucio add-rule user.root:test_dataset 1 NEW_RSE -bd51b767ef524878bb3cc68db16d2374 - - $ rucio list-rules --account root - -ID ACCOUNT SCOPE:NAME STATE[OK/REPL/STUCK] RSE_EXPRESSION COPIES EXPIRES (UTC) CREATED (UTC) --------------------------------- --------- ---------------------- ---------------------- ---------------- -------- --------------- ------------------- -991f9ace7ed74cad989efde90b6a23c5 root user.root:test.txt OK[1/0/0] NEW_RSE 1 2020-08-14 15:28:15 -bd51b767ef524878bb3cc68db16d2374 root user.root:test_dataset OK[1/0/0] NEW_RSE 1 2020-08-14 15:47:15 -``` - -### Adding Replication Rules Python Client Example - - - -```python ->>> from rucio.client import Client ->>> CLIENT = Client() ->>> list(CLIENT.list_account_rules('root')) - -[{u'locks_ok_cnt': 1, u'source_replica_expression': None, u'locks_stuck_cnt': 0, u'purge_replicas': False, u'rse_expression': u'NEW_RSE', u'updated_at': datetime.datetime(2020, 8, 14, 15, 28, 15), u'meta': None, -u'child_rule_id': None, u'id': u'991f9ace7ed74cad989efde90b6a23c5', u'ignore_account_limit': False, u'error': None, u'weight': None, u'locks_replicating_cnt': 0, u'notification': u'NO', u'copies': 1, u'comments': None, -u'split_container': False, u'priority': 3, u'state': u'OK', u'scope': u'user.root', u'subscription_id': None, u'stuck_at': None, u'ignore_availability': False, u'eol_at': None, u'expires_at': None, u'did_type': u'FILE', -u'account': u'root', u'locked': False, u'name': u'test.txt', u'created_at': datetime.datetime(2020, 8, 14, 15, 28, 15), u'activity': u'User Subscriptions', u'grouping': u'DATASET'}] - ->>> CLIENT.add_replication_rule([{'scope': 'user.root', 'name': 'test_dataset'}], 1, 'NEW_RSE') - -[u'76b262b45dca4e769221224e1ccf5c7a'] - ->>> list(CLIENT.list_account_rules('root')) - -[{u'locks_ok_cnt': 1, u'source_replica_expression': None, u'locks_stuck_cnt': 0, u'purge_replicas': False, u'rse_expression': u'NEW_RSE', u'updated_at': datetime.datetime(2020, 8, 14, 15, 28, 15), u'meta': None, -u'child_rule_id': None, u'id': u'991f9ace7ed74cad989efde90b6a23c5', u'ignore_account_limit': False, u'error': None, u'weight': None, u'locks_replicating_cnt': 0, u'notification': u'NO', u'copies': 1, u'comments': None, -u'split_container': False, u'priority': 3, u'state': u'OK', u'scope': u'user.root', u'subscription_id': None, u'stuck_at': None, u'ignore_availability': False, u'eol_at': None, u'expires_at': None, u'did_type': u'FILE', -u'account': u'root', u'locked': False, u'name': u'test.txt', u'created_at': datetime.datetime(2020, 8, 14, 15, 28, 15), u'activity': u'User Subscriptions', u'grouping': u'DATASET'}, {u'locks_ok_cnt': 1, -u'source_replica_expression': None, u'locks_stuck_cnt': 0, u'purge_replicas': False, u'rse_expression': u'NEW_RSE', u'updated_at': datetime.datetime(2020, 8, 14, 15, 47, 15), u'meta': None, u'child_rule_id': None, u'id': -u'bd51b767ef524878bb3cc68db16d2374', u'ignore_account_limit': False, u'error': None, u'weight': None, u'locks_replicating_cnt': 0, u'notification': u'NO', u'copies': 1, u'comments': None, u'split_container': False, u'priority': -3, u'state': u'OK', u'scope': u'user.root', u'subscription_id': None, u'stuck_at': None, u'ignore_availability': False, u'eol_at': None, u'expires_at': None, u'did_type': u'DATASET', u'account': u'root', u'locked': False, -u'name': u'test_dataset', u'created_at': datetime.datetime(2020, 8, 14, 15, 47, 15), u'activity': u'User Subscriptions', u'grouping': u'DATASET'}] -``` - - - -## Multi-VO Features - -From a users perspective, whether the instance is multi or single VO should not -change any functionality. Furthermore, depending on the client setup, VO does -not need to be provided. There are however, some occasions when an optional -argument for the VO can be given in a multi-VO instance. - -### Swapping VOs - -Just like how an identity can be associated with (and used to authenticate -against) multiple accounts, the same identity can be used for accounts at more -than one VO. Account and identity can be retrieved from the config file if -present, and the VO set there will be used (unless the environment variable -`RUCIO_VO` is also set, in which case the latter takes precedent). Both will be -ignored however if the VO is passed as an optional argument in the CLI or Python -client. Using this optional argument allows a user to quickly run commands on a -different VO they have access to. - -#### Swapping VOs CLI Example - -```shell -$ rucio whoami - -status : ACTIVE -account : jdoe_abc_account -account_type : USER -created_at : 2020-08-07T08:27:29 -updated_at : 2020-08-07T08:27:29 -suspended_at : None -deleted_at : None -email : N/A - -$ rucio --vo xyz whoami - -status : ACTIVE -account : jdoe_xyz_account -account_type : USER -created_at : 2020-08-11T12:13:58 -updated_at : 2020-08-11T12:13:58 -suspended_at : None -deleted_at : None -email : N/A -``` - -#### Swapping VOs Python Client Example - -```python ->>> from rucio.client import Client ->>> CLIENT = Client() ->>> CLIENT.whoami() - -{u'status': u'ACTIVE', u'account': u'jdoe_abc_account', u'account_type': u'USER', u'created_at': u'2020-08-07T08:27:29', u'updated_at': u'2020-08-07T08:27:29', u'suspended_at': None, u'deleted_at': None, u'email': u'N/A'} - ->>> CLIENT_XYZ = Client(vo='xyz') ->>> CLIENT_XYZ.whoami() - -{u'status': u'ACTIVE', u'account': u'jdoe_xyz_account', u'account_type': u'USER', u'created_at': u'2020-08-11T12:13:58', u'updated_at': u'2020-08-11T12:13:58', u'suspended_at': None, u'deleted_at': None, u'email': u'N/A'} -``` diff --git a/content/en/users/data/management/rucio/commands/_index.md b/content/en/users/data/management/rucio/commands/_index.md deleted file mode 100644 index e694c200f6..0000000000 --- a/content/en/users/data/management/rucio/commands/_index.md +++ /dev/null @@ -1,198 +0,0 @@ ---- -title: Rucio Command-Line Interface -linkTitle: Command Line -type: docs -weight: 20 -description: >- - The most common Rucio commands ---- - -## Introduction to Rucio commands - -There are many commands found within Rucio CLI that you may want to become -familiar with. In this guide I will provide a few of the common commands wanted -by new users. - -To find more of the commands that you may want to use type `rucio` into the -containerised client will provide all of the arguments for Rucio. Typing in the -command followed by `-h`, or `--help` will provide you with all the options that -are available as well as some explanation for each. - -### ping - -Is the simplest command that a user can use to ask the Rucio server which -version it is using. - -```shell -$ rucio ping -``` - -This checks that there is a connection between the containerised client and the -server. - -### whoami - -Another simple command, which asks the server for the information Rucio has on -the current user. - -```shell -$ rucio whoami -``` - -This will return output like the following: - -```shell -status : ACTIVE -account : user1 -account_type : USER -created_at : YYYY-MM-DDTHH:MM:SS -updated_at : YYYY-MM-DDTHH:MM:SS -suspended_at : None -deleted_at : None -email : myemail@domail.country -``` - -This ensures that you know which user you are interacting with Rucio as, this is -very important if you get multiple accounts. But also verifies that the client -is set up correctly. - -### upload - -A Rucio command that allows you to upload files from your current environment to -any RSE within your VO. - -```shell -$ rucio upload [-h] --rse RSE [--lifetime LIFETIME] [--scope SCOPE] - [--impl IMPL] [--register-after-upload] [--summary] - [--guid GUID] [--protocol PROTOCOL] [--pfn PFN] - [--name NAME] [--transfer-timeout TRANSFER_TIMEOUT] - [--recursive] - args [args ...] -``` - -Several of the options you will not need to use as they will be set by the Rucio -VO Admins when they set up the RSEs. Below are a list of options that you may -find useful: - -- `RSE` is the Rucio Storage Element or site that you wish to store the data at, - the list of available RSEs can be seen for your VO with the command - `rucio list-rses`. -- `Lifetime` is how long you wish the file to exist, not specifying will make - the file permanent until rucio is told to delete it. -- `Scope` can be used in many ways, but often can be an experiment name, or a - user space, all users have their own scope assigned to them `user.`. -- `Register-after-upload` allows for files to be uploaded to the destination, - and then registered with Rucio, rather than the other way around. This can be - useful if your connection is intermittent. -- `Name` is the name of the file that it will be registered to Rucio with, if - this is not set it will be the name of the file or files provided. -- `Recursive` Allows you to set the argument to a directory, and all files within - that directory and any subdirectories will be uploaded. -- `Args` is the path to the file, or files you wish to upload, this can be a - single file, directory (with recursive set), or a list of files separated with - a space e.g. - -```shell -rucio upload --rse main-rse file1 file2 file3 file4 -``` - -### get - -A Rucio command to download files from any RSE in your VO to your local -environment. - -```shell -$ rucio get [-h] [--dir DIR] [--allow-tape] [--rse RSE] [--rses RSES] - [--impl IMPL] [--protocol PROTOCOL] [--nrandom NRANDOM] - [--ndownloader NDOWNLOADER] [--no-subdir] [--pfn PFN] - [--archive-did ARCHIVE_DID] [--no-resolve-archives] - [--ignore-checksum] [--transfer-timeout TRANSFER_TIMEOUT] - [--transfer-speed-timeout TRANSFER_SPEED_TIMEOUT] [--aria] - [--filter FILTER] [--scope SCOPE] [--metalink METALINK_FILE] - [--deactivate-file-download-exceptions] - [dids [dids ...]] -``` - -- `Dir` is the location within the container you wish for the files to be - downloaded (if you wish to move these files outside of the container, you may - want to mount a volume in the container to allow the files to persist). -- `RSE(s)` specifying which RSE(s) you wish to download the files from, leaving - this blank will allow Rucio to decide which RSE(s) are best. -- `nrandom` allows you to specify a number and if the target is a dataset or - container will download n files from that DID. This allows you to check are - correct before committing to download the entire dataset or container. -- `dids` is the data identifier for the file, dataset or container you wish to - download. - -### add-rule - -Create a rule which Rucio will work to make true. These are often how files are -moved from site to site. Creating a rule that says file x (which is currently at -storagesite1), needs to be at storagesite2. Upon creation of the rule, Rucio -will ensure that the file is moved from where is closest to the new site. - -```shell -$ rucio add-rule [-h] [--weight WEIGHT] [--lifetime LIFETIME] - [--grouping {DATASET,ALL,NONE}] [--locked] - [--source-replica-expression SOURCE_REPLICA_EXPRESSION] - [--notify NOTIFY] [--activity ACTIVITY] - [--comment COMMENT] [--ask-approval] [--asynchronous] - [--delay-injection DELAY_INJECTION] - [--account RULE_ACCOUNT] [--skip-duplicates] - dids [dids ...] copies rse_expression -``` - -- `lifetime` How long you want the file to persist before it can be deleted by - Rucio. -- `locked` sets the dataset or container to a locked state, that prevents other - files from being added or removed. -- `dids` the files within Rucio you wish to be replicates. -- `copies` How many copies of the data you want to make. -- `rse_expression` can either be a specific RSE, or can be a filter. Expression, - such as `tape=True` or `country=UK` and Rucio will place as many copies as was - requested in different sites (when possible), to fulfil the rule. - -### delete-rule - -A command to delete a rule which you have created. Just because you have deleted -a rule does not mean the file will be deleted. But it will adjust your quota -accordingly. Other people within your VO may also have a rule that states the -file needs to be at the same site. - -```shell -$ rucio delete-rule [-h] [--purge-replicas] [--all] - [--rse_expression RSE_EXPRESSION] [--rses RSES] - [--account RULE_ACCOUNT] - rule_id -``` - -- `all` should not be used by users it it will attempt to delete all rules. -- `rse_expression` which RSE expression encapsulates the rules you wish to - delete, either rse_expression, or RSE needs to be specified. -- `RSES` exactly which RSE is the target of the rule deletion. -- `account` which account requires the rule to be deleted, this is generally only - needed by Rucio Admins and does not need to be specified if you are deleting - your own rules. -- `rule_id` is a Rucio specific ID for the file that you wish to be deleted, a - list of the rules, and their rule_ids that are within your account can be - retrieved by running `rucio list-rules --account youraccountname`. - -### list-rules - -A command to list all the rules related to an account, a DID, or a file. - -```shell -$ rucio list-rules [-h] [--id RULE_ID] [--traverse] [--csv] [--file FILE] - [--account RULE_ACCOUNT] - [--subscription ACCOUNT SUBSCRIPTION] - [did] -``` - -Provide a full list of the files IDs, account, scope, state, RSE/expression -copies and expiry. - -- `account` specify which account you wish to see the replication rules. -- `file` If you know the name of a specific file, this allows you to see all the - rules are associated with the file. -- `did` If a dataset or container are listed, all rules associated with the - specific DID will be displayed. diff --git a/content/en/users/data/management/rucio/dteam-vo/_index.md b/content/en/users/data/management/rucio/dteam-vo/_index.md deleted file mode 100644 index 54fa226c8a..0000000000 --- a/content/en/users/data/management/rucio/dteam-vo/_index.md +++ /dev/null @@ -1,108 +0,0 @@ ---- -title: "Dteam Specific Documentation" -type: docs -linkTitle: "Rucio Dteam" -weight: 40 -description: >- - How to get set up with the dteam VO ---- - -## Rucio-client setup - -The setup for the container is the same as that found in the [Getting Started](../getting-started/) -section. But is repeated here for ease. - -To get the Rucio client that is set up for dteam please use this -[Rucio Client](https://hub.docker.com/repository/docker/egifedcloud/rucioclient). -This would be done by running the command: - -```shell -$ docker run \ - -v :/opt/rucio/etc/rucio.cfg \ - -v :/opt/rucio/etc/usercert \ - -v :/opt/rucio/etc/usercert \ - -e RUCIO_CFG_CLIENT_CERT=/opt/rucio/etc/usercert.pem \ - -e RUCIO_CFG_CLIENT_KEY=/opt/rucio/etc/userkey.pem \ - -e RUCIO_CFG_CA_CERT=/opt/rucio/etc/web/ca-first.pem \ - --name=rucio-client \ - -it \ - -d egifedcloud/rucioclient:1.23.17 -``` - -Once the container is running you will need to copy some files, to have them -owned by the container user, rather then root, and then change the permissions -on those files so that they are appropriate for voms-proxy creation. To start -with step into the container by running: - -```shell -$ docker exec -it rucio-client bash -``` - -Once inside the container you can then copy and edit file permissions with the -following: - -```shell -$ cp /opt/rucio/etc/usercert /opt/rucio/etc/usercert.pem -$ cp /opt/rucio/etc/userkey /opt/rucio/etc/userkey.pem -$ chmod 600 /opt/rucio/etc/usercert.pem -$ chmod 400 /opt/rucio/etc/userkey.pem -``` - -You should now be able to generate a VOMS proxy using the credentials loaded -into the container, this is done by running the following command within the -container: - -```shell -$ voms-proxy-init --voms dteam -``` - -## Rucio configuration setup - -Inside your docker container edit the `rucio.cfg` file to include your 3 -character VO name, and account name. This will then be loaded into the Rucio -client. - -```ini -[common] -logdir = /var/log/rucio -multi_vo = True -loglevel = INFO -[client] -rucio_host = https://rucio-server.gridpp.rl.ac.uk:443 -auth_host = https://rucio-server.gridpp.rl.ac.uk:443 -vo = dtm -account = -ca_cert = /opt/rucio/etc/web/ca-first.pem -auth_type = x509_proxy -client_cert = /opt/rucio/etc/usercert.pem -client_key = /opt/rucio/etc/userkey.pem -client_x509_proxy = /tmp/x509up_u1000 -request_retries = 5 -``` - -## Confirmation of Client setup - -Once this is complete you should now have access to Rucio. This can be confirmed -with a ping and a whoami commands to verify one, the connection to the Rucio -host and two, that you are authenticating successfully as your user. - -```shell -$ rucio ping -1.23.17 -$ rucio whoami -status : ACTIVE -account : user -account_type : USER -created_at : YYYY-MM-DDThh:mm:ss -suspended_at : None -updated_at : YYYY-MM-DDThh:mm:ss -deleted_at : None -email : user@email.co.uk -``` - -Once these messages have been displayed with the relevent information, as a user -you should now have access to the Dteam VO, and can create rules, upload and -download files from the various RSEs. - -If you have any issues please do contact the -[Multi-VO admin / dteam VO admins](mailto:rucio-support@stfc365.onmicrosoft.com). diff --git a/content/en/users/data/management/rucio/getting-started/_index.md b/content/en/users/data/management/rucio/getting-started/_index.md deleted file mode 100644 index c06045d13a..0000000000 --- a/content/en/users/data/management/rucio/getting-started/_index.md +++ /dev/null @@ -1,158 +0,0 @@ ---- -title: Getting Started with Rucio -linkTitle: Getting Started -type: docs -weight: 10 -description: >- - How to get started with Rucio ---- - -## Rucio terms - -- [**Rucio Storage Element**](https://rucio.cern.ch/documentation/started/concepts/rucio_storage_element/) - (RSE) is another name for an endpoint, or storage solution. -- [**Rules**](https://rucio.cern.ch/documentation/started/concepts/replica_management) are an - instruction to Rucio to do a certain thing. This can be to ensure file _x_ has - at least 1 copy at _storagesite1_, or ensure file _y_ is on tape, or even on - tape at more than one location, or even file _z_ has 2 copies at any site - within a selection of sites. How you set up the RSE and the attributes you - give them allows for many different strategies to transfer and organise data. - Once a rule is created, Rucio will get to work to ensure that the rule is - satisfied at all times. -- **File** is single file within Rucio. -- **Dataset** is a collection of files, which may be a collection or related - results, or data. -- **Container** is a collection of Datasets which may build a larger subset of a - whole experiment. -- **Scope** is a collection in which files, datasets, and containers are placed. - Users will have their own scope, often user.username. But also experiments, - sub-experiments, or however you wish to organise the data can also have scopes. - Accounts can be given access to scopes by VO admins. -- **Data Identifier** (DID) uniquely identifies data in Rucio. It is made up - from the scope and the filename, separated by a colon (e.g. - _experiment1:file1_). - -## Getting started as a new user - -### Account creation - -1. To get set up with a Rucio account please create a ticket on - [GGUS](https://ggus.eu/?mode=ticket_submit). Please fill in the form with a - subject, description, ticket category - service request, priority - less - urgent, and under routing information please select Assign to support unit - - Rucio). Within the ticket description please include: - - - Desired Username (usually initials and surname e.g. John Doe would have - jdoe) - - Your email - - Name of the experiment / VO you are part of - - The subject of your eScience certificate - -**If you want password access we can organise a video call to explain or take -sensitive information if you prefer** -In Terms of testing you can join the test VO (dteam) to try Rucio as a service -and its capabilities. -**Please note that we are working on allowing Rucio accounts to be created and -accessed with IAM services,** **and -[EGI Check-in](https://docs.egi.eu/users/check-in/), but currently only support -x509 and password access.** - -1. Once our team has this information we will create you a Rucio account. - -### Docker container setup - -1. You will then need to install a containerised client on your computer. - - - Install Docker to run the container - - (for Windows users I would recommend - using WSL2) - - Follow the docker instructions to ensure it is running correctly. - - Using OpenSSL you will need to split your grid certificate bundle into the - certificate and key: - - ```shell - $ openssl pkcs12 -in <*.pfx> -out /sensible/path/usercert.pem -clcerts -nokeys - $ openssl pkcs12 -in <*.pfx> -out /sensible/path/userkey.pem -nocerts -nodes - ``` - -1. Run the Docker container using the following command: - -When running the block of code below, please replace all items within `<>` with -the relevant information. This uses a Rucio container that was set up for the EGI -communities. - -```shell -$ run \ - -e RUCIO_CFG_RUCIO_HOST=https://rucio-server.gridpp.rl.ac.uk:443 \ - -e RUCIO_CFG_AUTH_HOST=https://rucio-server.gridpp.rl.ac.uk:443 \ - -e RUCIO_CFG_AUTH_TYPE=x509_proxy \ - -e RUCIO_CFG_CLIENT_VO=<3 CHAR VO NAME LOWERCASE> \ - -e RUCIO_CFG_CLIENT_CERT=/opt/rucio/etc/usercert.pem \ - -e RUCIO_CFG_CLIENT_KEY=/opt/rucio/etc/userkey.pem \ - -e RUCIO_CFG_ACCOUNT= \ - -e RUCIO_CFG_CA_CERT=/opt/rucio/etc/web/ca-first.pem \ - -v :/opt/rucio/etc/web/ca-first.pem \ - -v :/opt/rucio/etc/usercert \ - -v :/opt/rucio/etc/userkey \ - --name=rucio-client \ - -it \ - -d egifedcloud/rucioclient:1.23.17 -``` - -This block of code may look large but it is configuring Rucio to connect to the -Multi-VO Rucio at RAL, your account and VO details, where you are loading them -into the container, and mounting the authentication details into the container. - -**The UK eScience CA 2B can be [obtained here](https://ca.grid-support.ac.uk/). -The 3 characters VO name will be provided to you when you sign up for a Rucio -account.** - -1. Run the following commands inside the docker container to finalise set up: - -```shell -$ cp /opt/rucio/etc/usercert /opt/rucio/etc/usercert.pem -$ cp /opt/rucio/etc/userkey /opt/rucio/etc/userkey.pem -$ chmod 600 /opt/rucio/etc/usercert.pem -$ chmod 400 /opt/rucio/etc/userkey.pem -``` - -### Rucio configuration setup - -You need to edit the `/opt/rucio/etc/rucio.cfg` file, this then needs to be -lightly edited to add your account name. This will then be loaded into the Rucio -client. - -```ini -[common] -logdir = /var/log/rucio -multi_vo = True -loglevel = INFO -[client] -rucio_host = https://rucio-server.gridpp.rl.ac.uk:443 -auth_host = https://rucio-server.gridpp.rl.ac.uk:443 -vo = <3 character VO name> -account = -ca_cert = /opt/rucio/etc/web/ca-first.pem -auth_type = x509_proxy -client_cert = /opt/rucio/etc/usercert.pem -client_key = /opt/rucio/etc/userkey.pem -client_x509_proxy = /tmp/x509up_u1000 -request_retries = 5 -``` - -**You should now have a fully set up Containerised Client for your Rucio -Account** **and VO which you can start in docker and use whenever you need it.** - -- If not please contact Rucio support - -## Getting started as a new VO - -- To get set up with a new VO on Multi-VO Rucio account please create a ticket - on [GGUS](https://ggus.eu/?mode=ticket_submit). Please fill in the form with a - subject, description, ticket category - service request, priority - less - urgent, and under routing information please select 'assign to support unit' - - Rucio. - -- We will set up a meeting to discuss Rucio, your needs, sites, and current set - up to ensure that Rucio can work for you, and will track progress with the - ticket.