Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple active GoodJob databases #725

Open
doxavore opened this issue Oct 18, 2022 · 3 comments
Open

Support multiple active GoodJob databases #725

doxavore opened this issue Oct 18, 2022 · 3 comments

Comments

@doxavore
Copy link

One of the great benefits that shake out of GoodJob is that enqueuing a job can be part of a database transaction that includes persisting the records that are needed when running the job.

Today, GoodJob supports multiple databases as long as GoodJob itself uses a single database (since active_record_parent_class was introduced in #238). It would be nice if we could extend the same guarantees we get with transactions and a single database to multiple databases. One could copy the GoodJob migrations into multiple migrations_paths, but we only have access to a single set of underlying GoodJob::BaseRecord-inheriting ActiveRecord models.

@bensheldon Do you have any thoughts on whether this is something we might support in GoodJob? I'd like to contribute in this area, but would want to make sure I'm doing so in a way that's consistent with your vision (perhaps for v3?).

@bensheldon
Copy link
Owner

@doxavore that's interesting! And I think I'm pretty reluctant to implementing that.

Overall, I think I'm not sure about the applicability of the use-case to the level of effort to the ongoing complexity cost.

If I'm understanding, it would add a third database option:

  • (default) store GoodJob records in your app's primary/singular database
  • (optional) store GoodJob records in one of your app's partitioned databases
  • (proposed) store GoodJob records in each of your app's partitioned databases

I'm currently tepid even on #238. I think the use-case is around flexibility ("this is how our application is architected"), whereas 99% of the real-world interest/questions I experience are about performance. I want to be really transparent in my recommendation that if I was in a position to choose with an application that hit a true job performance bottleneck (talking tens of millions of jobs), I would choose Sidekiq Pro over partitioning GoodJob's database records. Real talk! 😄 I think at the scale where one is burning up their relational database to run jobs, they should reach for a better tool (Redis, Kafka, etc.) .

And I realize that the benefit you want is transactional consistency, so you can lightly gloss over the previous paragraph, but I wanted to share that regardless 😊

I guess I consider transactional consistency a nice to have side-effect than a core feature of GoodJob. There's maybe some interesting workarounds possible in #712. Like:

GoodJob::Bulk.defer do
  MyJob.perform_later(my_record)
  my_record.update
end # <= job doesn't actually get enqueued until block exits here

Sorry to maybe overshare. I am curious though about your use-case. Are you developing an application right now where this feature would be useful to you?

@doxavore
Copy link
Author

Thanks for the context! Before opening this issue I'd poked around previous issues and saw you had some misgivings on the current functionality, so I was worried this may be a long shot.

Are you developing an application right now where this feature would be useful to you?

Yes! This is a known gotcha in a codebase today. We use multiple databases to support differently usage patterns and levels of reliability required in a single Rails app.

I think deferring may help if we were wanted to continue using a single database for GoodJob. Since we want consistency within a single database, though, I'm not sure it'll help our use case.

@bensheldon
Copy link
Owner

I have this idea churning away in the back of my mind. Still not planned, but thinking about it.

The way I'm thinking it could work would be for your application to create a subclass GoodJob::Execution and extend them for each of your database connections. I have been strongly thinking about moving to an extend strategy rather than base-class strategy for models (mentioned in #687 (comment))

Then, it would be necessary for you to inject those base classes into the Active Job adapter (e.g. instances of GoodJob::Adapter), and also inject them into a Manager each for the subclasses (described in #705).

I'm not quite sure how simple/clean it is to tell ActiveJob to enqueue a job to a specific Adapter instance (I don't think many people run multiple adapters). e.g. using ActiveJob's api would end up looking something like:

job = MyJob.new(args)
job.scheduled_at = 10.minutes.from_now # enqueuing options can only be set on the job instance
adapter = GoodJob::Adapter.new(job_record: CustomExecution)
adapter.enqueue_at(job, job.scheduled_at)

I dunno, seems complicated :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Inbox
Development

No branches or pull requests

2 participants