Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(#9238): add functionality of getting people as an AsyncGenerator in cht-datasource #9281

Conversation

sugat009
Copy link
Member

@sugat009 sugat009 commented Jul 24, 2024

Description

Add functionality of getting people as an AsyncGenerator in cht-datasource.

Ticket: #9238

Usage:

const getAllIterator = Person.v1.getAll(ctx)(Qualifier.byContactType('person'));
for await (const page of getAllIterator) {
  for (const doc of page) {
    ...
  }
}

Code review checklist

  • Readable: Concise, well named, follows the style guide, documented if necessary.
  • Documented: Configuration and user documentation on cht-docs
  • Tested: Unit and/or e2e where appropriate

License

The software is provided under AGPL-3.0. Contributions to this project are accepted under the same license.

@sugat009 sugat009 requested review from jkuester and m5r July 24, 2024 07:42
@sugat009
Copy link
Member Author

@jkuester @m5r I've created a draft PR for discussion.
Currently, the state of the code is that the natural setup of the functionality has been done according to the pattern in cht-datasource.
The current plan is to write the iterable logic in this function.
In contrast, the remote logic has been written in this function. The thing that I'm thinking about right now is that if a new endpoint different than the one in which the getPage(remote) would be requesting for i.e. /api/v1/person is necessary like a /api/v1/personAsync or something.
What do you think about this?

@sugat009 sugat009 self-assigned this Jul 24, 2024
@sugat009
Copy link
Member Author

sugat009 commented Jul 24, 2024

Additionally, the plan to implement the iterable for the local function Person.v1.getAll is to first query the view medic-client/contacts_by_type?<personType> for total row count and then, subsequently query page by page using the already existing Person.v1.getPage function until the total row count is reached.

Update:
Would it be better to continue paginating through the results until we encounter a page that returns no data? At that point, the Person.v1.getAll function would terminate its operation.

@jkuester
Copy link
Contributor

So, my thinking here was that we might be able to get away without having any separate remote/local logic at all. Instead we implement getAll in src/person.ts in such a way that it just calls the getPage function in src/person.ts. I am not sure if there is some extra complexity that would prevent this, but at first glace it seems possible. With this approach, we should be able to tell when we have reached the end of the data and can stop yielding when we get back an empty page, or a page with a length < 100.

Should not need any separate REST endpoint for this functionality. In reality, all we are implementing here is a TS convenience wrapper over top of getPage.

@sugat009
Copy link
Member Author

sugat009 commented Jul 24, 2024

So, my thinking here was that we might be able to get away without having any separate remote/local logic at all.

That is an interesting one and I agree that the whole implementation would be simpler and is also feasible. I might be a bit out of context here but why not the remote logic?

@sugat009 sugat009 force-pushed the 9238-add-functionality-of-getting-people-as-an-iterator-in-cht-datasource branch from 03e19f1 to c34716a Compare July 24, 2024 15:58
@jkuester
Copy link
Contributor

but why not the remote logic?

If the src/person.ts/getAll function is just calling the src/person.ts/getPage, then all of the remote vs local behavior will be handled by src/person.ts/getPage and can be transparent to src/person.ts/getAll. getAll is just getting arrays of Persons back. It should not even need to care if they are coming from the remote or the local adapter. At least that is my thinking anyway... 🤔

@sugat009 sugat009 changed the title feat(#9238): add functionality of getting people as an iterator in cht-datasource feat(#9238): add functionality of getting people as an AsyncGenerator in cht-datasource Jul 25, 2024
@sugat009 sugat009 added the Type: Feature Add something new label Jul 25, 2024
@sugat009 sugat009 marked this pull request as ready for review July 25, 2024 15:00
@sugat009 sugat009 requested review from jkuester and m5r August 7, 2024 11:33
shared-libs/cht-datasource/src/person.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/src/libs/core.ts Outdated Show resolved Hide resolved
/** @internal */
export interface Page<T> {
readonly data: T[];
readonly cursor: string;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we know that we have reached the end of the results, we should not return a cursor value.

  readonly cursor?: string;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I believe cursor's type should be

interface Page {
  readonly cursor: string | null;
}

That's more explicit to express "you've reached the end"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be more explicit and return "-1" and not nothing to indicate the end of iteration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@m5r and I commented simultaneously, null should do as well.

shared-libs/cht-datasource/src/person.ts Outdated Show resolved Hide resolved
shared-libs/cht-datasource/src/local/person.ts Outdated Show resolved Hide resolved
const docs = await getDocsByPage([personType.contactType], limit, skip);
const getDocsByPage = queryDocsByKey(medicDb, 'medic-client/contacts_by_type');

const fetchAndFilter = async (
Copy link
Contributor

@jkuester jkuester Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid logic here! 👍 I think we can tighten up the cursor calculation a bit (at least reduce the nesting if-statements):

const fetchAndFilter = async (
  currentLimit: number,
  currentSkip: number,
  currentPersons: Person.v1.Person[] = []
): Promise<Page<Person.v1.Person>> => {
  const docs = await getDocsByPage([personType.contactType], currentLimit, currentSkip);
  const noMoreResults = docs.length < currentLimit;
  const newPersons = docs.filter((doc): doc is Person.v1.Person => isPerson(settings, doc, doc?._id));

  // Need to slice here because we retrieve extra docs on re-fetches and may end up with too many.
  const overFetchCount = currentPersons.length + newPersons.length - limit || 0;
  const totalPersons = [...currentPersons, ...newPersons].slice(0, limit);
  if (noMoreResults) {
    return { data: totalPersons };
  }
  if (totalPersons.length === limit) {
    const nextSkip =  currentSkip + currentLimit - overFetchCount;
    return { data: totalPersons, cursor: (nextSkip).toString() };
  }

  // Re-fetch twice as many docs as we need to limit number of recursions
  const missingCount = currentLimit - newPersons.length;
  logger.debug(`Found [${missingCount}] invalid persons. Re-fetching additional records.`);
  const nextLimit = missingCount * 2;
  const nextSkip = currentSkip + currentLimit;
  return fetchAndFilter(nextLimit, nextSkip, totalPersons);
};

return fetchAndFilter(limit, skip);

shared-libs/cht-datasource/src/remote/person.ts Outdated Show resolved Hide resolved
Copy link
Member

@m5r m5r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing to add to Josh's comments, I'll approve to not block merging after Josh's approval

@sugat009 sugat009 requested a review from jkuester August 8, 2024 14:00
Base automatically changed from 9237-add-functionality-of-getting-people-with-pagination-in-cht-datasource to 9193-api-endpoints-for-getting-contacts-by-type August 9, 2024 13:33
@sugat009 sugat009 changed the base branch from 9193-api-endpoints-for-getting-contacts-by-type to master August 12, 2024 09:09
@sugat009 sugat009 changed the base branch from master to 9193-api-endpoints-for-getting-contacts-by-type August 12, 2024 09:09
@sugat009
Copy link
Member Author

sugat009 commented Aug 12, 2024

I am closing this PR, as I might have messed up somewhere during conflict resolution, creating many duplicate commits.
Reverted the commits before conflict resolution instead.

@sugat009 sugat009 closed this Aug 12, 2024
@sugat009 sugat009 reopened this Aug 12, 2024
@sugat009 sugat009 force-pushed the 9238-add-functionality-of-getting-people-as-an-iterator-in-cht-datasource branch from de5f786 to 3eeb049 Compare August 12, 2024 11:25
…238-add-functionality-of-getting-people-as-an-iterator-in-cht-datasource
@sugat009 sugat009 merged commit bf8a77d into 9193-api-endpoints-for-getting-contacts-by-type Aug 12, 2024
18 of 20 checks passed
@sugat009 sugat009 deleted the 9238-add-functionality-of-getting-people-as-an-iterator-in-cht-datasource branch August 12, 2024 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Add something new
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants