This is the first (potentially) non-breaking change medium version bump, but in reflecting on the scope of the changes being made, it feels appropriate to make the leap to version 1.7.0. Here's a breakdown of the changes:
Overview
RollupFullBatchRecalculator
is no longer an actual batch class, per se, though it still does batch processing. Instead, it takes advantage of the Apex Cursor beta to iterate through children records when doing full recalculations where the number of records exceeds the value forRollupControl__mdt.MaxLookupRowsBeforeBatching__c
. Like its batch class predecessor, it will continue to default to calculating rollup values for 500 children at a time - still configurable usingRollupControl__mdt.BatchChunkSize__c
. UsingDatabase.Cursor
has several big advantages over using a batch class, and while the feature is still in beta, I expect that it will GA soon- Previously, Apex Rollup took advantage of
Database.Stateful
, which is a marker interface that's part of theDatabase.Batchable
framework to calculate what a parent record's prior value was when that parent record had children split between batches. This happens more often than you'd expect, even with very fine-tuned ordering of the children records. With large data volume Salesforce orgs, keeping track of which parents have been updated while iterating through all of the children records represents a serious problem, and at a certain point when relying onDatabase.Stateful
to calculate millions of records and dozens of rollups for a single child type, the rollup calculation will inevitably fail as memory runs out. - While managing everything in-memory is extremely convenient as the package author, I'd basically arrived at the end of the road when it came to optimizing the batch full recalc process to try to avoid running out of memory. There were no tricks left - just the cold, hard fact that 12 MB of space will eventually run out with a one-to-many relationship between each parent record and the rollups being applied to that record when there are millions of parent records involved. This was the problem reported in #615
⚠️ Salesforce orgs have a MAXIMUM number of active DataWeave scripts that can be used.⚠️
If you saw the phrase "(potentially) non-breaking change" above and went 🤔,v1.7.0
adds a DataWeave script to the package. While I view it as unlikely that there are orgs out there running up against that limit, it's worth calling out here. Contact your AE if you truly cannot accommodate additional active DataWeave scripts and want to update. See the section titled "Easily Parsing Data" below for more info on why this change is being made- Because records are now created during "batch" full recalculations, if you abort a full recalculation using the Apex Jobs page in setup, or via some other means, there is the potential that
RollupState__c
records will be orphaned and won't be cleaned up. You will have to delete those records manually. More info is available in the "Easily Deleting Data" section.
Challenges
There's one natural conclusion when an in-memory data management strategy is no longer sufficient, and that's a record-based memory management solution. As a package author, however, it would be irresponsible (not to mention untidy) to implement a feature that contributes significantly to the data storage on a subscriber org. This means that the "easy" approach - a data record (henceforth referred to as "state", or "Rollup State", or the API name for the object) for each parent - was immediately disqualified. The solution would have to compress data such that the previously calculated values for a parent record could be:
- easily retrieved,
- easily parsed
- easily deleted
Easily Retrieving Data
In order to facilitate the easy retrieval of data, v1.7.0
uses a new custom object, RollupState__c
, which has on it 11 different custom text fields ranging from RollupState__c.RelatedRecordKeys0__c
all the way through RollupState__c.RelatedRecordKeys10__c
. Because you can't filter on long text area fields within SOQL, and because the maximum length of a standard text field is 255 characters, creating many fields like this is a crucial piece of the data compression strategy that makes it possible to store state for many parents within a single record. Assuming that most rollups will be keyed to standard Salesforce ids, it follows then that each related record key field can hold (255/(18 + 1) = 13
parent ids per field (the + 1
accounts for the comma separating each key).
Despite using 11 custom fields for this (which might sound like a lot), that ends up allowing for a maximum of 13 x 11 = 143
parent records' data being compressed into a single RollupState__c
. Now that the kinks have been ironed out with the logic that keeps track of which related record key field is being used at any given time, it will be possible to further expand the number of custom fields being used if there's any concern with the data storage use of Rollup State records for these calculations (which would allow for the data to be even further compressed). That being said, since relevant records are deleted within each cursor "chunk" that's processed, I am hopeful that the data storage concerns being discussed here are purely theoretical.
Easily Parsing Data
This proved to be an unexpected hurdle - initial benchmarking on deserializing the long text area on each Rollup State record was poor. As in, for some chunks that had only a dozen state records, the deserialization of those records back into Apex-defined types was causing the 60 second Apex CPU time limit to be exceeded 🤯.
This necessitated the next big change to Apex Rollup - the introduction of a DataWeave script to perform the deserialization. Using DataWeave, the deserialization time was cut from a process that at times was taking 30+ seconds to an operation that has always finished within a single second. That's enough of a justification for me to proceed with this piece of tech, though I do still have plans to circle back on the poor performance of the native deserializer.
Easily Deleting Data
At the end of each recalculation job, any remaining RollupState__c
records are deleted. This is something to be mindful of. If you were to abort a currently running recalculation job, this would effectively "orphan" those records, and they'll need to be deleted separately (by you, presumably). The framework will not (at this time) do any periodic maintenance on removing older RollupState__c
records that could be orphaned in this manner, though I am open to the idea of giving users a way to conveniently do so via the Apex Rollup app should the need for that arise.
Special Considerations
One of the... downstream effects... of moving away from using Batch Apex was that the majority of the Apex Rollup framework now relies on Queueables to perform async processing. In order to accommodate this, v1.7.0
takes (greater) advantage of Queueable Finalizers to coordinate work between different threads, as finalizers are the only place where it's possible to fully chain different jobs together without exceeding limits.
As I had many failing tests when replacing the Batchable version of RollupFullBatchRecalculator
with a Queueable, and I've been performing integration testing to sanity check huge swathes of this work, I don't expect that this change should be visible to people in anything other than a possible increase in Debug Logs being generated (as Finalizers create their own, separate, log file). I've mentioned it here, though, on the off-chance that this change does somehow lead to deleterious effects.
A huge thanks to @jongpie for reviewing #636 for this, and for being an invaluable colleague and friend