Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid querying through serialized_params #876

Open
ollym opened this issue Mar 3, 2023 · 4 comments
Open

Avoid querying through serialized_params #876

ollym opened this issue Mar 3, 2023 · 4 comments

Comments

@ollym
Copy link

ollym commented Mar 3, 2023

As we've scaled up with good_jobs we've started to feel significant pain particularly when the arguments of a job are large, the performance of queries that query through serialized_params can slow down dramatically.

I think you should add job_class and executions as columns and refactor any code that queries their values to not use serialized_params.

Is there any reason why you don't?

@ollym
Copy link
Author

ollym commented Mar 3, 2023

A quick monkey patch for anyone interested and having the same issues:

# db/migrate/....rb
add_column :good_jobs, :job_class, :string
add_column :good_jobs, :executions, :integer, default: 0
# config/initializers/good_job.rb
Rails.application.reloader.to_prepare do
  Rails.application.configure do
    # ... good job configuration here
  end

  module GoodJob::JobExtensions
    extend ActiveSupport::Concern

    prepended do
      scope :job_class, ->(job_class) { where(job_class: job_class) }

      # First execution will run in the future
      scope :scheduled, -> { where(finished_at: nil).where('COALESCE(scheduled_at, created_at) > ?', DateTime.current).where(arel_table[:executions].lt(2)) }

      # Execution errored, will run in the future
      scope :retried, -> { where(finished_at: nil).where('COALESCE(scheduled_at, created_at) > ?', DateTime.current).where(arel_table[:executions].gt(1)) }
    end

    def job_class
      read_attribute(:job_class)
    end

    def serialized_params=(serialized_params)
      self.job_class = serialized_params['job_class']
      self.executions = serialized_params.fetch('executions', 0)
      super
    end
  end
  GoodJob::Job.prepend(GoodJob::JobExtensions)


  module GoodJob::ExecutionExtensions
    extend ActiveSupport::Concern

    prepended do
      scope :job_class, ->(job_class) { where(job_class: job_class) }
    end

    def serialized_params=(serialized_params)
      self.job_class = serialized_params['job_class']
      self.executions = serialized_params.fetch('executions', 0)
      super
    end
  end
  GoodJob::Execution.prepend(GoodJob::ExecutionExtensions)


  module GoodJob::BaseFilterExtensions
    def job_classes
       filtered_query(params.slice(:queue_name)).unscope(:select)
                                                .group(:job_class).count
                                                .sort_by { |name, _count| name.to_s }
                                                .to_h
     end
  end
  GoodJob::BaseFilter.prepend(GoodJob::BaseFilterExtensions)
end

@bensheldon
Copy link
Owner

Ooh, that's interesting!

Is there any reason why you don't?

The reason upfront: the Dashboard is a work-in-progress and very YAGNI. So I appreciate you sharing: yes, you need it.

That's interesting that grabbing the values out of the JSONB column is that much slower than individual columns. I wouldn't have expected that. My intuition has been only to pull values out of serialized_params when they need to be indexed (although I could index them within jsonb, but that seems too extra), so that's good to know that I may need to adjust my intuitions.

This work is definitely doable and should be done. It'll be a little tricky to make sure that it can be done without requiring the database migration be run until the next major release, but that's a common problem I've been handling when these things come up. Here's an example:

scope :unfinished, (lambda do
if column_names.include?('finished_at')
where(finished_at: nil)
else
ActiveSupport::Deprecation.warn('GoodJob expects a good_jobs.finished_at column to exist. Please see the GoodJob README.md for migration instructions.')
nil
end
end)

It might take me a minute to get to it. Is this something you'd be able to submit a PR for?

@ollym
Copy link
Author

ollym commented Mar 3, 2023

Yea we migrated our webhook backend to good_job queue and so the arguments can be large (full request body) and that combined with 50k scheduled jobs meant things were grinding to a halt. We tried adding a btree index on serialised_params->>"job_class" but weirdly it has no effect.

It makes sense because I assume even to get a single value out of the json object it still has to parse the full body. We must have been hitting a memory limit or something.

I'll see what we can do about a PR

@ollym
Copy link
Author

ollym commented Apr 2, 2023

For others with the same issue, thanks to #894 (since good_job v3.15) the patch is significantly simpler, the following would work now:

# db/migrate/....rb
add_column :good_jobs, :job_class, :string
add_column :good_jobs, :executions, :integer, default: 0
# config/initializers/good_job.rb
Rails.application.reloader.to_prepare do
  module GoodJob::BaseExecutionExtensions
    extend ActiveSupport::Concern

    class_methods do
      def params_job_class
        arel_table[:job_class]
      end

      def params_execution_count
        arel_table[:executions]
      end
    end

    def job_class
      read_attribute(:job_class)
    end

    def serialized_params=(serialized_params)
      self.job_class = serialized_params['job_class']
      self.executions = serialized_params.fetch('executions', 0)
      super
    end
  end

  GoodJob::BaseExecution.prepend(GoodJob::BaseExecutionExtensions)
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Inbox
Development

No branches or pull requests

2 participants