Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: data letter queue #73

Merged
merged 16 commits into from
Aug 13, 2024
Merged

Feature: data letter queue #73

merged 16 commits into from
Aug 13, 2024

Conversation

heemankv
Copy link
Contributor

@heemankv heemankv commented Aug 8, 2024

This PR resolves Issue #55 .

  • Implemented Dead Letter Queue handler.
  • Adds last_job_status to job metadata.
  • Runs and Tests needs to spin up appropriate queues.
  • Test for handle_job_failure.

@heemankv heemankv self-assigned this Aug 8, 2024
Copy link
Contributor

@apoorvsadana apoorvsadana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original PR was #56. Created this because of merge conflicts as explained on Slack

Copy link
Contributor

@EvolveArt EvolveArt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add some tests for the edge cases where it would fail ?
Also I'm correct that in this PR the DLQ is implemented but is never used by the orchestrator right ?

@@ -127,6 +127,8 @@ impl TestConfigBuilder {
self.storage.unwrap(),
);

drop_database().await.unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would need more explanation here.
Ideally this should have been a separate PR, will keep in mind.

But if your question is any of these :

  1. Why is drop_database implemented in TestConfigBuilder ?
  2. Why are we not using ? and using unwrap() ?

Then :

  1. This is to ensure that each test case has a fresh database to work with so that no overlapping of database arguments exist.
  2. Our assumption is that there is no perk for a test case to return an Error, since it's a checking procedure we are fine a throwing the error there directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea the question was more, why do we drop in the middle of the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an initiation function for all the clients, which runs before each test we can drop the database at any point in this code.

#[case(JobType::SnosRun, JobStatus::Failed)]
#[tokio::test]
async fn handle_job_failure_with_failed_job_status_works(#[case] job_type: JobType, #[case] job_status: JobStatus) {
TestConfigBuilder::new().build().await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to a fixture

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in here.

TestConfigBuilder allows for customisation over any external client of our choice,
moving it to a global fixture is not feasible since all test cases have different customisation requirements.

We can make fixture for tests under same scope if they require same customised external clients.

for eg :
All tests under da_job if require same config customisation, can implement a fixture just for themselves.
similarly for other scopes .

We can create a separate issue for this and resolve there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me


if job.status == JobStatus::Completed {
log::error!("Invalid state exists on DL queue: {}", job.status.to_string());
return Ok(());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we fail silently here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DL-queue is supposed to handle actual failed cases.
If JobStatus::Completed job is pushed to DL-queue multiple times by the queuing agent,
we prefer not stopping the orchestrator rather failing silently.

crates/orchestrator/src/jobs/types.rs Outdated Show resolved Hide resolved

#[rstest]
#[case::pending_verification(JobType::SnosRun, JobStatus::PendingVerification)]
#[case::verification_timeout(JobType::SnosRun, JobStatus::VerificationTimeout)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are the other statuses not covered ?

Copy link
Contributor Author

@heemankv heemankv Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We initially covered all the statuses as shown here, but it felt redundant to test all, hence they were removed.

What do you suggest @EvolveArt ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well even if it feels redundant I think it's important to have them, you want to avoid having jobs at an unexpected state in the DLQ

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid, Implemented

@heemankv heemankv changed the title Feat/data letter queue Feature: data letter queue Aug 9, 2024
@heemankv
Copy link
Contributor Author

heemankv commented Aug 9, 2024

We should probably add some tests for the edge cases where it would fail ?
Also I'm correct that in this PR the DLQ is implemented but is never used by the orchestrator right ?

Please feel free to mention the edge case to run the tests on.
And yes this PR is for adding support of DLQ, after which we would setup DL-queue to be used in the future.
@apoorvsadana feel free to add more!
Thanks

@apoorvsadana
Copy link
Contributor

@EvolveArt the DLQ is being used. You need to set it up on SQS/RabbitMQ etc. and when messages fail, they automatically go to the DLQ and they are moved to the failed state.

Copy link
Contributor

@EvolveArt EvolveArt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@heemankv heemankv added the enhancement New feature or request label Aug 13, 2024
@heemankv heemankv merged commit 73355ca into main Aug 13, 2024
8 checks passed
@heemankv heemankv deleted the feat/data-letter-queue branch August 13, 2024 05:07
ocdbytes pushed a commit that referenced this pull request Oct 16, 2024
* dl-queue: added termination queue

* dl-queue: spwan consumer to a macro_rule

* dl-queue: test for handle_job_failure

* dl-queue: handle_job_failure failed test case

* dl-queue: fixed test cases

* dl-queue: tests fixed

* dl-queue: assert optimised

* dl-queue: DL job rewritten tests

* dl-queue: formatting changes

* dl-queue: update mod.rs

* dl-queue: lint fixes

* dl-queue: using strum for JobStatus Display

* dl-queue: added test cases for  handle_job_failure_with_failed_job_status_works

* fix: testcase
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants