-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce uniqueness for all GUID
fields
#6259
Comments
I still believe that a true GUID field should not be user-enterable even if uniqueness is enforced. Under that guise, someone could enter 12345 as a GUID and as long as it is unique for that table it would be valid. However, that entry is by no means globally unique. If we really want a user-enterable unique identifier, then call it that rather than a GUID. I recommend both having a GUID field that is entered by the system as a UUID as with other tables and a user-enterable unique identifier field to satisfy both requirements if needed. |
I suspect, given the push by DiSCCo and others for DOIs that we may need to support those at some point too. |
@acbentley, I think I agree with you in premise, and don't really have a horse in this race. However, I would like to express that I do not think the field should be completely off limits to users, as old data may come with existing GUID's that do follow the UUID standard. For instance, when migrating existing data, 5 of our collections used the UUID v1 standard, that needed to be assigned to the GUID field on import. We use these fields as the dwc::occurrenceID, so although technically they could be changed, that requires coordination with GBIF to construct the redirects. Under the current system old can be kept, while new records get a new UUID v4 generated. Having both a system generated and user generated one would work, but adds complexity, especially because in the case above it would fracture the field in two (old would be in one field, new would be in another). I am of the opinion that the current setup for the collection object GUID, in which new collection objects have a UUID v4 generated by default, and are read-only be default, but can still be accessed by the user, makes sense, and would make sense for other tables as well. While an institution that chooses to use |
@mpitblado Thanks. We just discussed that exact scenario in a meeting. I agree that there are some scenarios where you may want to copy and paste a GUID into a field. |
Is your feature request related to a problem? Please describe.
We have challenges when dealing with GUID (Globally Unique Identifier) fields in Specify that are not guaranteed to be globally unique. This leads to potential data integrity issues, as duplicate GUIDs can theoretically exist across different tables or collections, causing confusion and errors in data retrieval and management. We are not following the very definition of GUID if we allow duplication as it complicates data handling and can lead to incorrect associations between records.
https://guid.one/guid
Describe the solution you'd like
I propose that all GUID fields in Specify be made globally unique as a strict requirement. Per @melton-jason, this could be implemented by modifying the existing
guid_rules
business rule to enforce an implicit uniqueness rule for all GUID fields across the application.This rule should be non-configurable to ensure consistency and reliability in data management.
We need to implement a mechanism to identify and manage cases of duplicated GUIDs, raising appropriate business rule exceptions when necessary. This could involve using a save blocker when editing affected records or generating a report for an administrator that indicates instances of duplication, similar to the process when configuring a new uniqueness rule for the first time.
Describe alternatives you've considered
One alternative is to keep the current system unchanged and rely on users to manage GUID uniqueness manually, but this approach is prone to human error and does not provide any checks to ensure data integrity. Another alternative is to only enforce uniqueness for certain critical tables, but this could lead to inconsistent data management practices across the application. We should really follow the definition as written.
Reported By
Andy Bentley at KU Ichthyology & Specify
Additional context
The current uniqueness rule only applies to
CollectionObject -> guid
andStorage -> guid
, which are modifiable by the user. This inconsistency can lead to issues as highlighted in the comment by @melton-jason. If we establish a comprehensive uniqueness requirement for allGUID
fields, we can improve data integrityFor reference, please see the discussion here between @melton-jason and @acbentley: GitHub Comment.
List of Tables with GUIDs
The text was updated successfully, but these errors were encountered: