Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat/incremental document sync #674

Open
wants to merge 38 commits into
base: develop
Choose a base branch
from

Conversation

JGiter
Copy link
Contributor

@JGiter JGiter commented Nov 6, 2024

  • Documents will be synchronized starting from the block of latest synchronization
  • Only documents updated since the latest synchronization will be synchronized
  • Fixed using of DID synchronization job

}

@OnQueueWaiting()
async OnQueueWaiting(job: Job) {
this.logger.debug(`Waiting ${job.name} document ${job.data}`);
async OnQueueWaiting(jobId: number) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argument to this hook differs from others https://docs.nestjs.com/techniques/queues#event-listeners-1

Copy link
Collaborator

@jrhender jrhender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start to the PR but I think the issue with the jobs disappearing should be addressed

Comment on lines 380 to 389
).filter((doc) => {
const identity = doc.id.split(':')[3];
return changedIdentities.includes(identity);
});
didsToSynchronize.forEach(async (did) => {
this.logger.debug(`Synchronizing DID ${did.id}`);
await this.pinDocument(did.id);
});

await this.latestDidSyncRepository.save({ block: topBlock });
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @JGiter, thanks for the PR. Really exciting that this code is being improved.

I think this approach is flawed however. The issue that I see is that jobs are persisted in separate storage (Redis) and then processed asynchronously, but the latestDidSync topBlock is persisted synchronously. So we could have a situation like this:

  1. A bunch of DID update jobs are queued and lastestDidSync.block is updated
  2. The Redis storage fails or is wiped (it's in-memory, so definitely could happen) -> then the system won't update the DIDs in the lost jobs because lastestDidSync.block already ahead of these events.

I'm thinking an alternative approach to get around this problem would be to:

  1. Mark updated DIDs as "stale":
  2. Query all of the "changedIdentities" since the last check (as you've done in this PR)
  3. Mark all of these DIDs as "invalid" (in the Postgres Entities). I think something like this could be used to bulk update entities.
  4. Update latestDidSync.block
  5. Query all of DID entities with invalid status and add a job to the queue (if it doesn't exist already)
  6. When done processing the job, update the DID to be "valid"

In this above, even if the job queue is reset/wiped, the DID entities will still have invalid status, so job for them can be re-added.

Also, we can change the GET Did Doc endpoint to synchronously query the RPC if the cached DID is invalid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This is really good improvement

src/modules/did/did.service.ts Outdated Show resolved Hide resolved
@@ -486,7 +511,7 @@ export class DIDService implements OnModuleInit, OnModuleDestroy {

private async pinDocument(did: string): Promise<void> {
try {
await this.didQueue.add(UPDATE_DID_DOC_JOB_NAME, did);
await this.didQueue.add(UPDATE_DID_DOC_JOB_NAME, { did });
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JGiter does this change relate to this comment you made in the PR description?

Fixed using of DID synchronization job

Does the data need to be in an object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can not say why, but when passing job as a string, I observed multiple errors error parsing JSON though there are no seemingly JSON parsing in DidProcessor. So I decided to make job as an object, same as in PinProcessor.

@@ -486,7 +508,7 @@ export class DIDService implements OnModuleInit, OnModuleDestroy {

private async pinDocument(did: string): Promise<void> {
try {
await this.didQueue.add(UPDATE_DID_DOC_JOB_NAME, did);
await this.didQueue.add(UPDATE_DID_DOC_JOB_NAME, { did }, { jobId: did });
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifying job id to avoid adding job for the same document

@PrimaryGeneratedColumn()
id: number;

@OneToOne(() => DIDDocumentEntity, (document) => document.id)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addinig relation only on DidSyncStatusEntity side. The DidDocumentEntity will not be changed

@JGiter JGiter requested a review from jrhender November 8, 2024 10:00
Copy link
Collaborator

@jrhender jrhender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some further comments but I think it is continuing to move in a good direction 👍

Comment on lines 17 to 19
@OneToOne(() => DIDDocumentEntity, (document) => document.id)
@JoinColumn()
document: string;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems wrong to me that this property is string type. I think that typically in a one-to-one join column, the join property has the same type as the joined entity.
See the Profile example at the start of the TypeORM documentation: https://orkhan.gitbook.io/typeorm/docs/one-to-one-relations

Suggested change
@OneToOne(() => DIDDocumentEntity, (document) => document.id)
@JoinColumn()
document: string;
@OneToOne(() => DIDDocumentEntity, (document) => document.id)
@JoinColumn()
document: DIDDocumentEntity;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I overlooked that

Comment on lines 228 to 232
const updated = await this.didRepository.save(updatedEntity);
await this.didSyncStatusRepository.save({
document: did,
status: DidSyncStatus.Synced,
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be in a transaction as I think an invariant should be that there is always a didSyncStatus entity if there is a did entity.

Comment on lines 571 to 583
const staleDIDs = (
await this.didRepository.find({ select: ['id'] })
).filter((doc) => {
const identity = addressOf(doc.id);
return changedIdentities.includes(identity);
});
await this.didSyncStatusRepository
.createQueryBuilder()
.useTransaction(true)
.update(DidSyncStatusEntity)
.set({ status: DidSyncStatus.Stale })
.where({ document: In(staleDIDs) })
.execute();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seems to me like a better approach would be to rely on the database to do the filtering in the where clause, rather than reading all dids into application memory and then filtering in the application. I think that, if the number of DIDs became large enough, this this.didRepository.find({ select: ['id'] query would probably fail.

I suppose the challenge might be how to use the addresses returned from changedIdentities to find the didSyncStatus entities to update. I think a couple options are:

  1. Store the address of the identity on the DidSyncStatusEntity
  2. Join to DIDDocumentEntity in the query and filter on the id property. Of course the addresses would need to be transformed to did:ethr DID ids.

Note that I think that option 2 might be slightly less complicated if we got rid of the one-to-one join entity and just added a new column to the DidDocumentEntity.
I think we only really need a single isStale boolean column (or could also keep the enum), so the impact on the performance of the database query when reading the DID Document would be negligible, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for suggestions. Option 2 seems more safe to me, because it excludes case when two status entities connected to the same document. I would also like to avoid changing document entity. It is used everywhere in ssi-hub and I am bit afraid to cause some unexpected changes. And besides property isState is not inherently part of the document.

Comment on lines 577 to 583
await this.didSyncStatusRepository
.createQueryBuilder()
.useTransaction(true)
.update(DidSyncStatusEntity)
.set({ status: DidSyncStatus.Stale })
.where({ document: In(staleDIDs) })
.execute();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this query work for existing cached documents (already existing in the cache I mean)? Maybe I missed it, but I don't see a migration that adds a DidSyncStatusEntity for all existing cached DID Documents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I missed that

With digits redactor, the logs looked like:

debug [DIDService] : 2024-11-21T22:32:25.999Z - Fetched 0 DID events from interval [DIGITS, DIGITS]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants