-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature review for various features. #290
Feature review for various features. #290
Conversation
This is amazing work @mfendeksilverstripe, thanks to you and the whole team! These features look fantastic and I'd love to see every last one of them integrated. |
Wow, these are some fantastic suggested contributions back to the module - big shout out to you and the team! I'm keen to see thoughts from others on the specific features, but they all look like great additions to me. |
190f4c6
to
e0735d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work Mojmir! Left some minor review comments inline. Got some security concerns with "Feature 6 - Job management" (low impact). Otherwise I'd love to see all eleven features in the module.
I'm assuming you'd convert anything that's currently an Extension
into modifications on the class it was extending? Extensions slow everything down, so we should only use them when necessary.
Feature 11 - Recommended queue settings for medium stack size
Hmm my first instinct was to recommend docs on https://docs.silverstripe.cloud. But I think it's more of a general "fine tuning" section in the module readme. Maybe this could be accompanied with some rules of thumb? Some of the details might already be explained, e.g. in the doorman module. But queue fine tuning is so time intensive that it's worth a bit of investment into describing it better IMHO
* @property int $randomID | ||
* @package App\Queue\Dev | ||
*/ | ||
class Job extends Queue\Job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be a bit more explicit about this and call it TestJob
(rather than relying on the task being in the Dev
namespace). Over time, there might be other "dev jobs" with different purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code follows the feature based namespace approach which made the code nice and portable within the bespoke code. Of course, I'm happy to change the namespace / class naming convention to whatever makes more sense within the module. This change would be done as a part of standard pull request flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if you had a second "job" that would rely on the "queue" (which all jobs do), then what would the FQCN be?
return; | ||
} | ||
|
||
$job = Injector::inst()->create($this->jobClass); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably check that $jobClass
actually contains a valid job class definition (for both dev experience and security reasons)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
echo '<br />'; | ||
|
||
echo 'Submitting will apply change immediately:<br><br>'; | ||
echo '<button type="submit">Update status</button>'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the TaskRunner
controller enforces authorisation on this task (effectively CLI or ADMIN), it's not automatically guarding against CSRF. While CSRF via POST is a bit harder to achieve, this would still make logged-in admins vulnerable. Check out URLSpecialsMiddleware
in core in how we've handled that for dev/build
, you might be able to hook into the same system? Alternatively, use the Silverstripe Form
APIs which should enable CSRF by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I was hoping to use forms and templates to improve this dev task :)
* | ||
* @package App\Queue | ||
*/ | ||
class Logger implements LoggerInterface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would benefit from a config example. How are you injecting the logger only during queued job executions? Are you ensuring that a plaintext formatter is attached to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see this example. The logger is attached to a helper which is executed within the job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've argued in #309 that I think this isn't the right approach :)
*/ | ||
protected function getCommand($binary, $worker, $stdout, $stderr) // phpcs:ignore SlevomatCodingStandard.TypeHints | ||
{ | ||
return sprintf('nohup %s %s %%s %s %s & echo $!', $binary, $worker, $stdout, $stderr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this shouldn't be the default behaviour? Assuming nohup
is a pretty standard unix command, it shouldn't increase the requirements for the module.
How does this deal with runaway child processes that don't have an explicit max execution time set on them (via your ExecutionTime
trait)? In most cases. PHP CLI executions won't be time limited. Should you only use this manager if you've done your homework in every job type that's run in your project? Maybe we need to define a relatively generous max exec time by default. It seems better than a) terminating child processes while still running (status quo), or b) risking runaway processes (your change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick comparison:
Current command:
- child processes terminate when parent process terminates
- lower chance of having zombie processes
- potentially less efficient job execution as child processes may terminate prematurely
New command
- child processes terminate independently from parent process
- higher chance of having zombie processes if not handled properly (max execution time needs to be set for jobs)
- potentially more efficient job execution as child processes may continue execution after parent process terminates
We tested the new approach extensively on a large project with high queue usage. Every non-trivial job was assigned a reasonable max execution time and it works really well.
Overall, I agree that setting a global default for time execution and using this new command by default looks like a great option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK can you a max execution default part of the same feature PR please? And check if there's enough docs around considerations for execution time when implementing jobs.
* @property array $remaining | ||
* @package App\Queue | ||
*/ | ||
abstract class Job extends AbstractQueuedJob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a better name :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to rename this during standard pull request flow.
public function setup(): void | ||
{ | ||
$this->remaining = $this->items; | ||
$this->totalSteps = count($this->items); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$items
is pretty opaque here, maybe add a getItems()
to the interface and describe what it should look like? I guess that's implementation specific, could be a class name, an array, etc.
Do you have an example to illustrate what you could do with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, some generic examples:
- job which generates image variants (one item represents one image variant - array)
- job which migrates a list of objects (one item represents object ID - integer)
- job which updates static cache (one item represents one URL - string)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'm wondering how much value there is in defining an abstract class for this, with so much implicit variations. Maybe this could just be a "howto: implement multi step jobs" in the docs? Something that simply points out that all jobs have $remaining
and $totalSteps
.
Here's my take on what I'd consider the most valueable features for the module, in case we need to sequence the work (in descending order): Feature 8 - Job error logging |
Thanks for feedback @chillu I'll start breaking out the features into separate pull requests. |
Feature 9 - Job manager process covered in #305 |
e0735d7
to
1c53d19
Compare
Added Feature 12 - Automatic Jobs retries |
@mfendeksilverstripe yes agreed that the features you've listed here looks great. I think the priority order that Ingo listed above looks like a pretty sensible way to order things Feel free to create PR's that go straight into the queuedjob modules branched off and into the Cheers! |
Thank you @emteknetnz , will start splitting the features into separate PRs 👍 |
This has been sitting here with no action for a really long time - I'm going to close it on the assumption that it has served its purpose. |
Feature review for multiple features
Disclaimer
This pull request serves as a preview for features which can be imported into this module. Feature code is included as is from the bespoke project (it's serving as reference only). It's already well organised for module import as it contains no bespoke code dependencies unless stated in the documentation.
The goal here is to review each feature and determine if effort is worth putting into moving this feature to the module. Features identified as "worth the effort" will be separated into their own pull request which follows standard review flow.
Please
DO NOT
code review this PR. The code is not meant to be merged, only reviewed on do we want this feature in the module or not basis.Test setup
All features below are used on at least three live projects of various sizes (small, medium, large).
Dependencies
List of features / fixes included in this bundle
This is the full list of features / fixes with details explaining why this feature is needed or what problem is it trying to solve. Features will be numbered for quick reference but the list is in arbitrary order.
Feature 1 - Improved Jobs admin UI
Problem
Finding specific jobs in the Jobs admin is a chore, developers often find themselves scrolling through multiple pages. Finding the right jobs is sometimes impossible as the filter criteria input is quite limited. Using Text fields for input is often tedious.
Solution
Improved UI. More filters, better input fields, more useful information in the columns.
Code reference
App\Queue\Admin\Extension
Resolution
Feature 2 - Improved Job edit form
Problem
Job data is inaccessible in the jobs admin which makes debugging jobs quite difficult. It often requires the developer to download DB snapshot in order to do anything. In case job messages get corrupted (for example in case of getting truncated), they become inaccessible forcing the developer to rely on DB dump instead.
Solution
Expose job data and raw messages as read only fields in the job edit form.
Code reference
App\Queue\Admin\DataExtension
Resolution
Feature 3 - Jobs cleanup task
Problem
Default jobs cleanup task is unusable for large number of jobs as it deletes the jobs as models. Jobs cleanup is an ongoing task as completed jobs pile up and deleting becomes a necessity (some operations become slow with large jobs table).
Solution
Implement a dev task to cleanup jobs which is really fast and runs on schedule (every hour).
Code reference
App\Queue\Cleanup\Task
Feature 4 - Queue test task / job
Problem
When setting up a queue it's tricky to test if the setup is working as expected. Queue can be tested with any job but this may require extra effort and may not be reliable:
Solution
Implement a simple test job and task to queue the test jobs, which are reliably not doing anything else apart from testing the queue.
Code reference
App\Queue\Dev
Feature 5 - Factory for jobs
Problem
Some operations (like migrations) require creation of thousands of jobs. This can actually be a challenge mostly because of the following issues:
Solution
Instead of creating thousands of jobs via a dev task, we will use a factory job as an intermediary to ensure we can avoid redundant jobs creation. The jobs creation flow will be as follows:
Code reference
App\Queue\Factory
Feature 6 - Job management
Problem
Getting a good overview on how is queue processing jobs is quite difficult from the jobs admin. Also, batches of jobs may need to be manually restarted which is a huge chore if jobs admin UI is used as it has to be done one job at a time. Manually restarting hundreds of jobs is likely too tedious.
Solution
Queue report / Current state
Queue report / UI draft
Job status update / Current state
Job status update / UI draft
Code reference
App\Queue\Management
Feature 7 - Execution time
Problem
We may want to set limit to execution time on jobs or dev tasks. This should be done in a safe way via callback as the limit should only apply to the job / task execution.
Solution
Execution time trait which provides a small helper method to later execution time.
Code reference
App\Queue\ExecutionTime
Feature 8 - Job error logging
Problem
Default error logging for jobs is not great. Debug information is often missing important details or the information is not available at all. In some cases the information is logged but matching the Raygun error message with a specific job may be quite difficult.
Solution
Use a job error logger which logs all of the issues inside job messages. If job execution fails and triggers errors, all relevant information will be present in the job messages which makes the debugging process for developer that much easier.
Code reference
App\Queue\Extension
App\Queue\Logger
Feature 9 - Job manager process
Problem
Using asynchronous queue runner has a flaw - when the manager process terminates, it also terminates all of its child processes. This is a mistake because there is a valid use case where manager process doesn't have any more work to do (no more jobs to start) but the child processes are still executing jobs.
Solution
Fix the shell command and prevent the manager process from killing of the child processes.
Code reference
App\Queue\Manager
Resolution
Feature 10 - generic process items job
Problem
Majority of non-trivial jobs can be written as a set of sequence of items that need to be processed. This means that there is a lot of code repetition in the job implementation for these kind of jobs.
Solution
Create an abstract class to cover a job which processes sequence of items. The job will act as a parent for any job implementations that are of this kind.
Code reference
App\Queue\Job
Feature 11 - Recommended queue settings for medium stack size
Problem
There are no recommended queue settings based on stack size. There are quite a few constants to get right.
Solution
Start a documentation on recommended queue settings based on stack size.
Code reference
Feature 12 - Automatic Jobs retries
Problem
Some jobs fail and we can't change that. For example jobs that contain third party API requests or automated publishing of multiple localisations of the same page at the same time via Embargo / Expiry module. We have to accept that these jobs may fail but we should be able to minimise the failure occurrence.
Solution
Automated broken job retries. Number of retries are configurable and also the time delay between each retry. This is especially useful if some third party API has a small downtime or there is DB deadlock in place. Waiting with the job retry for a few minutes may make the difference.
Code reference
App\Extensions\QueuedJobs QueuedJobServiceExtension