Skip to content
This repository has been archived by the owner on Feb 8, 2024. It is now read-only.

fix: Bump failure counter on job error #36

Merged
merged 3 commits into from
Feb 6, 2024

Conversation

tomasfarias
Copy link
Contributor

@tomasfarias tomasfarias commented Jan 10, 2024

Also bump webhook_jobs_failed on job error. Maybe we should have a separate counter for this? PgJobError usually indicates something on our end, so it could be worth it to have a distinction.

EDIT: I've done that, there is now a webhook_jobs_database_error counter.

.map_err(|error| WorkerError::PgJobError(error.to_string()))?;
webhook_job.complete().await.map_err(|error| {
metrics::counter!("webhook_jobs_database_error", &labels).increment(1);
WorkerError::PgJobError(error.to_string())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's minor, but I think we should have a different enum entry like DatabaseError too. Just helps for grep in the future.

Copy link
Contributor

@bretthoerner bretthoerner Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't have to be for this PR btw.

edit: It might help for distinguishing PgJobError or DatabaseError in tests, i.e. here: https://github.com/PostHog/rusty-hook/pull/36/files#diff-3c41c606a4d1c83ee33ba25992dd4d7a4288bd20270e68f210e086a1101462acR335

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I was thinking in my head things like complete() returned a sqlx::Error but I guess those are wrapped internally:

pub enum PgJobError<T> {
#[error("retry is an invalid state for this PgJob: {error}")]
RetryInvalidError { job: T, error: String },
#[error("{command} query failed with: {error}")]
QueryError { command: String, error: sqlx::Error },
#[error("transaction {command} failed with: {error}")]
TransactionError { command: String, error: sqlx::Error },
}

It's a little weird that RetryInvalidError will get treated as a DB error here. Although I guess that should never happen, since we're just calling complete/fail, it feels like the types could help us better here somehow.

Copy link
Contributor Author

@tomasfarias tomasfarias Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree we should differentiate between RetryInvalidError and database errors. RetryInvalidError is the only special error as it returns the job itself to be failed (otherwise we would be consuming it), so I'd split it off this enum and rename the enum to DatabaseError or something similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to update this PR, will need a bit of time as I have some support tickets to work through.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to update this PR, will need a bit of time as I have some support tickets to work through.

No big deal, it was mostly just a thought. Feel free to merge this if you want too, either way.

@bretthoerner
Copy link
Contributor

I totally forgot about this one, @tomasfarias did you want to merge this as-is?

@tomasfarias tomasfarias force-pushed the fix/bump-failure-counter-on-job-error branch 3 times, most recently from 3e61c91 to 620c81d Compare February 5, 2024 18:10
@tomasfarias
Copy link
Contributor Author

@bretthoerner

Not sure if tests will pass, but given our conversation I've refactored PgQueueError to separate RetryInvalidError into its own struct (at the same time making type signatures simpler for methods that aren't retry). Now it should be easier to differentiate between DatabaseErrors and RetryInvalidError.

@tomasfarias
Copy link
Contributor Author

Looks like test failures are (for now) due to me making new_from_pool non-fallible (as it could not fail). Can clean those up later...

@tomasfarias tomasfarias force-pushed the fix/bump-failure-counter-on-job-error branch from 620c81d to 2adfd31 Compare February 5, 2024 18:26
@tomasfarias tomasfarias force-pushed the fix/bump-failure-counter-on-job-error branch from 2adfd31 to 46d3102 Compare February 6, 2024 00:05
@tomasfarias
Copy link
Contributor Author

Waiting for #60 to be merged as there will be conflicts to fix, and makes more sense to tackle them here.

@tomasfarias tomasfarias force-pushed the fix/bump-failure-counter-on-job-error branch from 46d3102 to 0726e02 Compare February 6, 2024 17:04
@tomasfarias
Copy link
Contributor Author

Rebased on main after #60 was merged. This should be fine to merge too assuming tests are green!

@tomasfarias tomasfarias merged commit 5b3123e into main Feb 6, 2024
4 checks passed
@tomasfarias tomasfarias deleted the fix/bump-failure-counter-on-job-error branch February 6, 2024 17:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants