Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST] - Extract Unique Records by External ID #931

Closed
PotSeht opened this issue Nov 18, 2024 · 4 comments
Closed

[FEATURE REQUEST] - Extract Unique Records by External ID #931

PotSeht opened this issue Nov 18, 2024 · 4 comments
Assignees
Labels
completed The issue was successfully resolved/Feature is completed feature-request New feature request or a change in the existing functionality not-supported

Comments

@PotSeht
Copy link

PotSeht commented Nov 18, 2024

Sometimes, we may be working with an org that has poor data quality, primarily, duplicate records. Take the below example of three accounts, each with the same name. Importantly, there is no other writeable fields that can be used to distinguish one from another.

We may use this to seed data with production data, but disregard duplicated records.

Take the below example of three accounts with the same name.

image

If I run an Upsert using externalId : Name from this environment to a sandbox, the three records migrate across. However, upon running the same command again, the result is five records.

I believe this is because at the point of the second run, the source and target is matching against one of the accounts, but not the other two. So it updates one record, and creates two others, which at least seems to be true in some form from the isolated test I ran below.

image

It would be helpful to only extract unique values based on the external ID supplied, so that only one record is migrated across and all child records use that single key to relate to. E.g 3 accounts with 1 contact each, would result in the contacts all consolidating under the one unique account upon migration.

My line of thinking was a parameter against ScriptObject that basicallly retrieves the first hit of each key and disregards the rest when querying.

That being said Im unsure if my use case is common enough to warrant this feature request, please consider it though! 🙂

@PotSeht PotSeht added the feature-request New feature request or a change in the existing functionality label Nov 18, 2024
@hknokh
Copy link
Collaborator

hknokh commented Nov 18, 2024

Hello, @PotSeht

Thank you for your feature request.
I will review it as soon as possible and provide updates as they become available.

Cheers

Copy link

This case has been marked as 'to-be-closed', since it has no activity for the 3 days.
It will be automatically closed in another 3 days of inactivity.

@github-actions github-actions bot added the to-be-closed The issue is about to be closed label Nov 22, 2024
@hknokh2
Copy link
Contributor

hknokh2 commented Nov 22, 2024

Hello!

Thank you for your suggestion.

While this could be a useful improvement in some cases, I have a couple of concerns:

  1. The main purpose of an external ID is to uniquely distinguish between records. If that’s not sufficient, you can use Composite External ID Keys to ensure uniqueness. Therefore, maintaining the uniqueness of external IDs is a responsibility to address at your end.

  2. This enhancement is quite complex and could lead to unexpected regressions, which I would prefer to avoid.

For these reasons, I will not be implementing this feature request.

Best regards.

@github-actions github-actions bot removed the to-be-closed The issue is about to be closed label Nov 23, 2024
@PotSeht
Copy link
Author

PotSeht commented Nov 23, 2024

Hello @hknokh

I appreciate considering the feature and understand your stance. I have been using composite external keys which is definetely helpful, although ultimately it does require looking at the data (we have clients where multiple external references may be duplicated for some bizarre reason).

I have started performing a check such as SELECT COUNT(ID), EXTID FROM SOBJECT GROUP BY EXTID HAVING COUNT(ID) > 1 which is a good indicator of whether the dataset contains duplicates across the one or more keys.

@PotSeht PotSeht closed this as completed Nov 23, 2024
@hknokh hknokh added completed The issue was successfully resolved/Feature is completed not-supported labels Nov 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
completed The issue was successfully resolved/Feature is completed feature-request New feature request or a change in the existing functionality not-supported
Projects
None yet
Development

No branches or pull requests

3 participants