Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make locations system field #34852

Merged
merged 19 commits into from
Jul 30, 2024

Conversation

jingcheng16
Copy link
Contributor

@jingcheng16 jingcheng16 commented Jul 3, 2024

Product Description

Should have no visible changes, except for user cannot set commcare_location_id, commcare_location_ids, and commcare_primary_case_sharing_id in UserData directly

Technical Summary

Ticket: https://dimagi.atlassian.net/browse/SAAS-15716

Please review by commit.

Safety Assurance

Safety story

Write new tests for it.
This PR only changes where we pull data from, UserData or the location_id and assigned_location_ids attribute on CouchUser.

Automated test coverage

corehq.apps.user_importer.tests.test_importer:TestMobileUserBulkUpload ensures location in user data will be updated when use bulk upload to update user's location.
New tests added in corehq.apps.users.tests.test_user_data:TestUserDataModel ensure location in user data will be updated when user's location changed.

QA Plan

QA Ticket: https://dimagi.atlassian.net/browse/QA-6781

QA plan:

  • make changes to a mobile worker's location, and use User API to list all mobile workers, make sure what is returned in user data is expected. (If no location assigned, then no location_id, location_ids in user data)
  • try to update location_id and location_ids in user data through User API and get error.

Rollback instructions

  • This PR can be reverted after deploy with no further considerations

Labels & Review

  • Risk label is set correctly
  • The set of people pinged as reviewers is appropriate for the level of risk of the change

@dimagimon dimagimon added the Risk: Medium Change affects files that have been flagged as medium risk. label Jul 3, 2024
@jingcheng16 jingcheng16 force-pushed the jc/userdata-pull-location-from-system-fields branch 6 times, most recently from e543199 to c689862 Compare July 8, 2024 19:36
- Ensured that the system correctly handles cases where location information is not available in CommCareUser objects.

- Updated the _provided_by_system property to conditionally include COMMCARE_LOCATION_ID, COMMCARE_LOCATION_IDS and COMMCARE_PRIMARY_CASE_SHARING_ID.
If location data (location_id, location_ids) for a CouchUser has never been set before, it won't be included in _provided_by_system. So I add _keys_provided_by_system
@jingcheng16 jingcheng16 force-pushed the jc/userdata-pull-location-from-system-fields branch from c689862 to bee93c5 Compare July 8, 2024 19:38
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added new tests first, and make sure previous implementation passed the new tests, then make changes to the code.

@jingcheng16 jingcheng16 added the product/invisible Change has no end-user visible impact label Jul 8, 2024
@jingcheng16 jingcheng16 marked this pull request as ready for review July 8, 2024 19:45
@jingcheng16 jingcheng16 requested a review from esoergel as a code owner July 8, 2024 19:45
'commcare_primary_case_sharing_id': self.loc1.location_id,
})

def test_reset_locations_and_user_data_updated(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these test names (not this one) are pretty verbose. Since this set of tests is for user_data do they all need the suffix and_user_data_updated or could that be implied and left out of the function names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Evan, my goal is for people to know what I'm trying to test by the name of the test without reading the code. So the action: "reset locations" and the expected result: "user data updated". Some are verbose might because the action is complicated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that line of reasoning. Since it's testing the effect of the action on the user data, what would you think about phrasing it as updates_user_data instead? To me, that reads much more clearly, like "performing x action has y result", where the result is that it updates user_data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

udpates_user_data sounds more natural! Thank you! Updated!


# Set primary location to loc2
self.user.set_location(self.loc2)
self.assertEqual(user_data.to_dict(), {
Copy link
Contributor

@nospame nospame Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised we don't need to fetch user_data here again and in other tests with a second set of assertions, like the initial user_data = self.user.get_user_data(self.domain)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a good answer. 😂 I just try, if it doesn't get updated, I refetch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user.get_user_data is memoized on the user model, so anything accessing it on the same user instance ( (including set_location) without clearing will receive the same UserData object. If you fetched a separate instance of the user and updated the location there, then you would need to refresh, though calling user.get_user_data wouldn't do the trick (similar to how user.get_location_id wouldn't update in that scenario).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @esoergel ... I still don't understand. If it is receiving the same UserData object, why we don't need to call refresh_from_db on that object?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a pass-by-reference. set_location and this test both operate on not just the same user, but the same user instance and by extension the same UserData instance. That is, if you called id(user_data) in both places, you'd get the same value. This means it's functionally the same as doing:

user_data = <get user data>
user_data['commcare_location_id'] = location.location_id
assertEqual(user_data.to_dict(), {...})

It doesn't matter that setting commcare_location_id happens in a different method, it's still mutating the same instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see!!! Thank you so much!!!

@@ -430,3 +430,9 @@ def test_reset_locations_and_user_data_updated(self):
'commcare_location_ids': f"{self.loc1.location_id} {self.loc2.location_id}",
'commcare_primary_case_sharing_id': self.loc1.location_id,
})

def test_change_data_provided_by_system_will_raise_user_data_error(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: less verbosity along the lines of test_changing_system_provided_data_raises_user_data_error

@biyeun
Copy link
Member

biyeun commented Jul 9, 2024

I think it would be a good idea for @esoergel to review a lot of these changes, as he's most familiar with the locations code and can raise any issues that might have been missed

@@ -140,8 +143,18 @@ def _get_profile(self, profile_id):


@patch('corehq.apps.users.user_data.UserData._get_profile', new=_get_profile)
class TestUserDataModel(SimpleTestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you think of putting these tests in TestUserData above instead? That has DB tests and a real user already, whereas this test class only deals with the UserData wrapper model

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I also find test_location_assignment covers it! I think it is no harm to add more test? Should I remove the new tests or move it to TestUserData ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say remove them if they do test the same stuff, just less to maintain, shorter build time. test_change_data_provided_by_system_will_raise_user_data_error I don't think is covered already though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I'll remove them except for test_change_data_provided_by_system_will_raise_user_data_error also rename it to test_changing_system_provided_data_raises_user_data_error as suggested by Evan.


# Set primary location to loc2
self.user.set_location(self.loc2)
self.assertEqual(user_data.to_dict(), {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user.get_user_data is memoized on the user model, so anything accessing it on the same user instance ( (including set_location) without clearing will receive the same UserData object. If you fetched a separate instance of the user and updated the location there, then you would need to refresh, though calling user.get_user_data wouldn't do the trick (similar to how user.get_location_id wouldn't update in that scenario).

}
if getattr(self._couch_user, 'location_id', None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use self._couch_user.get_location_id(domain). The reason this errored is that web users don't have a assigned_location_ids attribute, since for them location information is stored on the domain membership (as it's scoped to the domain). If you use get_location_id(domain), it should work for both user types.

That said, web users don't currently have locations in user data - it's only added in later:

def get_user_session_data(self, domain):
# TODO can we do this for both types of users and remove the fields from user data?
session_data = super(WebUser, self).get_user_session_data(domain)
session_data['commcare_location_id'] = self.get_location_id(domain)
session_data['commcare_location_ids'] = user_location_data(self.get_location_ids(domain))
session_data['commcare_primary_case_sharing_id'] = self.get_location_id(domain)
return session_data

if user.is_web_user():
fields['commcare_location_id'] = user.get_location_id(domain)
fields['commcare_location_ids'] = user_location_data(user.get_location_ids(domain))
fields['commcare_primary_case_sharing_id'] = user.get_location_id(domain)

So you could probably either switch these if statements to if user.is_commcare_user, or reconcile the two approaches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me that this is a difference between web and mobile workers - for web, the fields are always present, but for mobile, they're only present if non-empty. That is a gap - I didn't consider the significance of that distinction when implementing the web users bit, but I think we should bring it in to line. If that makes sense to do as part of this PR, great, otherwise I can follow up with it afterwards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ethan, what do you mean by "bring it in to line"? I saw CouchUser has get_location_id , and both CommCareUser and WebUser have get_location_ids. I will just call this two methods instead of access the fields directly. What else do you want me to include in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be correct, thank you. I mean that right now web users always have commcare_location_id in their restores, even when not assigned a location, because of that first block of code I linked above. This I now think is an inconsistency, and I'd like to give web users the same behavior as mobile workers. Happy to do that in another PR though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Ethan!

  • Will make comcare_location_id only present if non-empty have an unexpected impact for web user?
  • "web users don't have a assigned_location_ids attribute" > I see we don't store location_id and assigned_location_ids for WebUser, but we can still access them on WebUser because they are attributes for CouchUser, it is just empty. Would move them to CommCareUser make sense?
  • I made changes in 911410a and 5193c76. But haven't fix the inconsistency. I would prefer having that fix in a separate PR and make sure it is safe to make that change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ethan, my change failed on test_user_data_ignores_location_fields (https://github.com/dimagi/commcare-hq/actions/runs/9912924492/job/27388787564?pr=34852). Seems like previous behavior is, WebUser don't have commcare_location_id and commcare_location_ids in their user data! Now after my change, it will has... I'm concerned of change the existing behavior. Also I don't know if I by accident achieved the "bring it in to line"... I might still not fully understand what you mean and what you're trying to achieve. 😞

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for web users, the location fields are only available in the restore and user cases, not in user data directly. I think for this PR it should be sufficient to add a check to only insert them in user data for mobile workers, but I'd still go through the get_location_id type accessors.

@@ -2099,7 +2097,6 @@ def set_location(self, location, commit=True):
raise AssertionError("You can't set an unsaved location")

user_data = self.get_user_data(self.domain)
user_data['commcare_location_id'] = location.location_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beautiful stuff

fall back to object() so nothing will equal it if key doesn't exist, better than fall back to None or ''
Comment on lines 85 to 90
if settings.UNIT_TESTING:
# Some test don't have an actual user existed
if self._couch_user:
_add_location_data()
else:
_add_location_data()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can combine these conditionals and avoid needing to pull the body out like so:

Suggested change
if settings.UNIT_TESTING:
# Some test don't have an actual user existed
if self._couch_user:
_add_location_data()
else:
_add_location_data()
# Some test don't have an actual user existed
if self._couch_user or not settings.UNIT_TESTING:
_add_location_data()

@jingcheng16
Copy link
Contributor Author

@esoergel Hi Ethan, do you mind giving a final round of review?
I removed most tests I added, but kept test_no_location_info_in_user_data_when_no_location_assigned and test_change_data_provided_by_system_will_raise_user_data_error
I added comment to test_user_data_ignores_location_fields because when it failed and find it confusing. I also realized test in TestWebUserBulkUpload might have some issue, because when I remove @patch('corehq.apps.user_importer.importer.domain_has_privilege', lambda x, y: True), test test_web_user_location_add, test_web_user_location_remove and test_invite_location_add won't fail. I expect it to fail because in importer.py, we check if self.domain_info.can_assign_locations before we update locations. But I don't plan to fix this in my PR, as I have a tight timeline for User API work.

@@ -1839,7 +1839,12 @@ def test_user_data_profile_removal(self):
def test_uncategorized_data(self):
self._test_uncategorized_data(is_web_upload=True)

@patch('corehq.apps.user_importer.importer.domain_has_privilege', lambda x, y: True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a @privilege_enabled test decorator that you might be interested in

Co-authored-by: Ethan Soergel <[email protected]>
@jingcheng16
Copy link
Contributor Author

QA find out, when we remove all location from one mobile worker, the api call will still return those old location_id and location_ids of that mobile worker. Because when remove all locations, the mobile worker's location_id and assigned_location_id would be empty, then it won't be returned in _provided_by_system to overwrite the existing location fields in UserData. I'll follow up with the migration to delete all location fields in UserData for mobile workers, (because web user don't store location fields in their user data). I'll run migration manually after the PR is merged. ( I'll run migration for india and swiss on Wednesday afternoon, and for prod on Thursday morning.)

FYI @esoergel

@@ -162,8 +195,8 @@ def get(self, key, default=None):
return self.to_dict().get(key, default)

def __setitem__(self, key, value):
if key in self._provided_by_system:
if value == self._provided_by_system[key]:
if key in self._system_keys:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this same change be applied to the other places _provided_by_system is referenced, in update, __delitem__, and pop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ethan, that's my oversight. I thought no location fields will be in userdata after migration, so we will never encounter situation that people want to update or delete or pop location fields in userdata. But I think if I consider the time after the code is deployed and before the migration, I should use _system_keys in all places. Fixed in e28058f

# Ensure location keys are only from _provided_by_system
for key in LOCATION_KEYS:
if key in combined_dict and key not in self._provided_by_system:
del combined_dict[key]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that system fields shouldn't come from _local_to_user, so it'd make sense to me to remove them only from there, rather than the combined dict, something like this:

**{k: v for k, v in local_to_user.items() if k not in self._system_keys}

You could do that here, or even in __init__, which would also mean these would be removed from other places, like raw and the places that modify _local_to_user. It'd also mean that these fields would be removed from the DB whenever user data models are saved (which might be helpful with the migration).

As-is, I think doing del user_data['commcare_location_id'] won't raise an error, though it probably should.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ethan, I love you advice and implement in bdba44e. However I didn't add it in init, I think it is an overkill, as we stop writing location info into UserData in hq, and also forbid this behavior in __set__, update.

@jingcheng16
Copy link
Contributor Author

jingcheng16 commented Jul 26, 2024

I've reverted this migration on staging, so I can test the new commit I made to this PR. QA-6846 still pass.

@jingcheng16 jingcheng16 merged commit 773b108 into master Jul 30, 2024
13 checks passed
@jingcheng16 jingcheng16 deleted the jc/userdata-pull-location-from-system-fields branch July 30, 2024 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
product/invisible Change has no end-user visible impact Risk: Medium Change affects files that have been flagged as medium risk.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants