Resolve map task issues for batchable list #1772
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TL;DR
Before this PR, if a list is batchable (utilizing FlytePickleTransformer), it is batched into a single item by default and stored in one pickle file. This results in the literal list containing only one literal. However, to successfully execute the map task with the current implementation, the length of the literal list should match that of the original list.
Since this portion of the map task logic resides within flyteplugin, and users may rarely upgrade propeller, it's more practical to implement this change within flytekit.
The proposed solution involves appending placeholders (none literals) to the literal list until its length is equivalent to the original list.
When converting to a Python value, the conversion can be halted if the literal is of the 'none' type. Given that the list is batchable, all genuine literals have to be "pickle literals" and cannot be none literals.
The motivation to maintain the batch mechanism is that it enables the transformation of large, batchable lists to be performed incredibly quickly. It eliminates the need for uploading numerous small pickle files to S3 (one for each), and instead, allows for the uploading of a single large file. While the asynchronous method also saves time, it is considerably slower compared to the batch method.
Checks
The map task ran successfully.
Large, batchable lists transform extremely quickly.