irods · alanking · Dec 20, 2023 · Dec 15, 2023 · Dec 15, 2023 · Dec 15, 2023
diff --git a/README.md b/README.md
@@ -34,12 +34,12 @@ In iRODS terminology, the `attribute` is defined by a **plugin_specific_configur
 
 For example:
 ```
-imeta add -R fast_resc irods::storage_tiering::group example_group 0
-imeta add -R medium_resc irods::storage_tiering::group example_group 1
-imeta add -R slow_resc irods::storage_tiering::group example_group 2
+imeta add -R fast_resc irods::storage_tiering::group example_group_1 0
+imeta add -R medium_resc irods::storage_tiering::group example_group_1 1
+imeta add -R slow_resc irods::storage_tiering::group example_group_1 2
 ```
 
-This example defines three tiers of the group `example_group` where data will flow from tier 0 to tier 2 as it ages.  In this example `fast_resc` is a single resource, but it could have been set to `fast_tier_root_resc` and represent the root of a resource hierarchy consisting of many resources.
+This example defines three tiers of the group `example_group_1` where data will flow from tier 0 to tier 2 as it ages.  In this example `fast_resc` is a single resource, but it could have been set to the root of a resource hierarchy consisting of many resources.
 
 ### Setting Tiering Policy
 
@@ -112,7 +112,7 @@ For a default installation the following values are used:
     "minimum_restage_tier" : "irods::storage_tiering::minimum_restage_tier",
     "preserve_replicas" : "irods::storage_tiering::preserve_replicas",
     "object_limit" : "irods::storage_tiering::object_limit",
-    "default_data_movement_parameters" : "<EF>60s DOUBLE UNTIL SUCCESS OR 5 TIMES</EF>",
+    "default_data_movement_parameters" : "<EF>60s REPEAT UNTIL SUCCESS OR 5 TIMES</EF>",
     "minumum_delay_time" : "irods::storage_tiering::minimum_delay_time_in_seconds",
     "maximum_delay_time" : "irods::storage_tiering::maximum_delay_time_in_seconds",
     "time_check_string" : "TIME_CHECK_STRING",
@@ -162,6 +162,8 @@ A tier within a tier group may identify data objects which are in violation by a
 
 Data objects which have been labeled via particular metadata, or within a specific collection, owned by a particular user, or belonging to a particular project may be identified through a custom query.  The default attribute **irods::storage_tiering::query** is used to hold this custom query.  To configure the custom query, attach the query to the root resource of the tier within the tier group.  This query will be used in place of the default time-based query for that tier.  For efficiency this example query checks for the existence in the root resource's list of leaves by resource ID.  Please note that any custom query must return DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM in that order as it is a convention of this rule engine plugin.
 
+**Checking for resources in violating queries is required to prevent erroneous data migrations for replicas on other resources which may represent other tiers in the storage tiering group.** This can be done in the manner shown below (`DATA_RESC_ID in ('10068', '10069')`) or via resource hierarchy (e.g. `DATA_RESC_HIER like 'root_resc;%`), but the query must filter on resources to correctly identify violating objects.
+
 ```
 imeta set -R fast_resc irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM WHERE META_DATA_ATTR_NAME = 'irods::access_time' AND META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING' AND DATA_RESC_ID IN ('10068', '10069')"
 ```
@@ -216,3 +218,49 @@ In order to log the transfer of data objects from one tier to the next, set `dat
 },
 ```
 
+## Limitations
+
+There are a few known limitations to the storage tiering plugin which should be noted explicitly for understanding different failure modes which users may experience.
+
+### A data object should only have replicas in one tiering group
+
+Any given data object should only have replicas in a single tiering group. Stated negatively, a data object should NOT have replicas in multiple tiering groups. The tiering group for a data object is tracked by an AVU that looks like this:
+```
+attribute: irods::storage_tiering::group
+value: example_group_1
+units: 1
+```
+
+The value is the tiering group to which this object belongs. There should only be one AVU with the attribute `irods::storage_tiering::group` associated with it. Here is an example of the AVUs an object with multiple tiering groups would have:
+```
+attribute: irods::storage_tiering::group
+value: example_group_1
+units: 1
+---
+attribute: irods::storage_tiering::group
+value: example_group_2
+units: 2
+```
+
+Notice the different values indicating that the object has replicas in two different tiering groups.
+
+Support for multi-group data objects may become available in the future, but it should be avoided at this time.
+
+### Managing replicas and metadata outside of storage tiering should be done with caution
+
+Replicating, trimming, and manipulating `irods::storage_tiering`-namespaced metadata on data objects which are under management in a tiering group using the storage tiering plugin should be done only when necessary. If any metadata is not in the expected state or replicas are not found in their expected resources, unexpected behavior can and will occur as it relates to storage tiering operations.
+
+### A resource should only belong to one tier in a given group
+
+Resources may belong to multiple tiering groups (e.g. a common archive). It is not recommended for a resource to be tagged with multiple tiers for a given group as the storage tiering plugin assumes that a resource only represents a single tier in any given group. In other words, a resource should not have multiple AVUs like this:
+```
+attribute: irods::storage_tiering::group
+value: example_group_1
+units: 0
+---
+attribute: irods::storage_tiering::group
+value: example_group_1
+units: 1
+```
+
+The above AVUs indicate that the resource represents tier 0 AND tier 1 in example_group_1. This should not be done.
diff --git a/libirods_rule_engine_plugin-unified_storage_tiering.cpp b/libirods_rule_engine_plugin-unified_storage_tiering.cpp
@@ -20,6 +20,9 @@
 #include <irods/irods_server_api_call.hpp>
 #include "exec_as_user.hpp"
 
+#include <irods/filesystem.hpp>
+#include <irods/irods_query.hpp>
+
 #undef LIST
 
 // =-=-=-=-=-=-=-
@@ -84,12 +87,35 @@ namespace {
         return std::make_tuple(l1_idx, resource_name);
     } // get_index_and_resource
 
+    auto resource_hierarchy_has_good_replica(RcComm* _comm,
+                                             const std::string& _object_path,
+                                             const std::string& _root_resource) -> bool
+    {
+        namespace fs = irods::experimental::filesystem;
+
+        const auto object_path = fs::path{_object_path};
+
+        const auto query_string = fmt::format("select DATA_ID where DATA_NAME = '{0}' and COLL_NAME = '{1}' and "
+                                              "DATA_RESC_HIER like '{2};%' || = '{2}' and DATA_REPL_STATUS = '1'",
+                                              object_path.object_name().c_str(),
+                                              object_path.parent_path().c_str(),
+                                              _root_resource);
+
+        const auto query = irods::query{_comm, query_string};
+
+        return query.size() > 0;
+    } // resource_hierarchy_has_good_replica
+
     void replicate_object_to_resource(
         rcComm_t*          _comm,
         const std::string& _instance_name,
         const std::string& _source_resource,
         const std::string& _destination_resource,
         const std::string& _object_path) {
+        // If the destination resource has a good replica of the data object, skip replication.
+        if (resource_hierarchy_has_good_replica(_comm, _object_path, _destination_resource)) {
+            return;
+        }
 
         dataObjInp_t data_obj_inp{};
         rstrcpy(data_obj_inp.objPath, _object_path.c_str(), MAX_NAME_LEN);