-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: data letter queue #73
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Original PR was #56. Created this because of merge conflicts as explained on Slack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add some tests for the edge cases where it would fail ?
Also I'm correct that in this PR the DLQ is implemented but is never used by the orchestrator right ?
@@ -127,6 +127,8 @@ impl TestConfigBuilder { | |||
self.storage.unwrap(), | |||
); | |||
|
|||
drop_database().await.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would need more explanation here.
Ideally this should have been a separate PR, will keep in mind.
But if your question is any of these :
- Why is
drop_database
implemented inTestConfigBuilder
? - Why are we not using
?
and usingunwrap()
?
Then :
- This is to ensure that each test case has a fresh database to work with so that no overlapping of database arguments exist.
- Our assumption is that there is no perk for a test case to return an Error, since it's a checking procedure we are fine a throwing the error there directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea the question was more, why do we drop in the middle of the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is an initiation function for all the clients, which runs before each test we can drop the database at any point in this code.
#[case(JobType::SnosRun, JobStatus::Failed)] | ||
#[tokio::test] | ||
async fn handle_job_failure_with_failed_job_status_works(#[case] job_type: JobType, #[case] job_status: JobStatus) { | ||
TestConfigBuilder::new().build().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to a fixture
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in here.
TestConfigBuilder
allows for customisation over any external client
of our choice,
moving it to a global fixture is not feasible since all test cases have different customisation requirements.
We can make fixture for tests under same scope if they require same customised external clients.
for eg :
All tests under da_job
if require same config customisation, can implement a fixture
just for themselves.
similarly for other scopes .
We can create a separate issue for this and resolve there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good to me
|
||
if job.status == JobStatus::Completed { | ||
log::error!("Invalid state exists on DL queue: {}", job.status.to_string()); | ||
return Ok(()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we fail silently here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DL-queue is supposed to handle actual failed cases.
If JobStatus::Completed
job is pushed to DL-queue multiple times by the queuing agent,
we prefer not stopping the orchestrator rather failing silently
.
|
||
#[rstest] | ||
#[case::pending_verification(JobType::SnosRun, JobStatus::PendingVerification)] | ||
#[case::verification_timeout(JobType::SnosRun, JobStatus::VerificationTimeout)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are the other statuses not covered ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We initially covered all the statuses as shown here, but it felt redundant to test all, hence they were removed.
What do you suggest @EvolveArt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well even if it feels redundant I think it's important to have them, you want to avoid having jobs at an unexpected state in the DLQ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid, Implemented
Please feel free to mention the edge case to run the tests on. |
@EvolveArt the DLQ is being used. You need to set it up on SQS/RabbitMQ etc. and when messages fail, they automatically go to the DLQ and they are moved to the failed state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* dl-queue: added termination queue * dl-queue: spwan consumer to a macro_rule * dl-queue: test for handle_job_failure * dl-queue: handle_job_failure failed test case * dl-queue: fixed test cases * dl-queue: tests fixed * dl-queue: assert optimised * dl-queue: DL job rewritten tests * dl-queue: formatting changes * dl-queue: update mod.rs * dl-queue: lint fixes * dl-queue: using strum for JobStatus Display * dl-queue: added test cases for handle_job_failure_with_failed_job_status_works * fix: testcase
This PR resolves Issue #55 .
last_job_status
to job metadata.handle_job_failure
.