Skip to content

v3.3 Multipart Upload

Andrey Kurilov edited this page Mar 26, 2017 · 1 revision

Overview

Some cloud storages support the concatenation of the data object parts uploaded independently.

Limitations

  1. The storage API supporting the MPU is used. These are Amazon S3 and Openstack Swift currently.
  2. In the distributed mode, all object parts are processed by the single storage driver.
  3. "Create" load type is used to split the large objects into the parts uploaded separately.
  4. No guarantees about the time of each upload completion request is issued and finished. This means that user should use count limit for a multipart upload load job.

Approach

Mongoose has the so called I/O task abstraction. I/O tasks are executed by the specific storage drivers. The storage driver may be able to detect the "multipart" I/O tasks and execute the corresponding sequence of the "sub-tasks":

  1. Initiate the object MPU.
  2. Upload the object parts.
  3. Commit the object MPU.

Configuration

The "item-data-ranges-threshold" configuration parameter controls the MPU behavior. The value is the size in bytes. Any new generated object is treated as "large" if its size is more or equal than the configured threshold.

Reporting

Parts List Output

If multipart upload task is finished the record containing the pair of the object name and upload id is written to the parts.upload.csv file.

Future Enhancements

  • Support Read for the segmented objects
  • Support Update for the segmented objects
  • Support Copy for the segmented objects
Clone this wiki locally