Skip to content

v3.1 Chain Operations

Andrey Kurilov edited this page Feb 22, 2017 · 1 revision

Introduction

Sometimes it's neccessary to perform several I/O operations on each of items (files/objects/etc) asynchronously. For example, the basic load job performs a single I/O operation on each item. Than another basic load job may perform another single I/O operation on the items passed the previous load job. If the so called chain load job is used each item is processed with a sequence of the I/O operations independently. For example create the items, after each item is created read it immediately, update and read again, then delete.

Requirements

  1. Support the configurable delay between the I/O operations. Disabled by default.
  2. Be able to perform the different I/O operations on the different set of the storage nodes. For example, create the items on one set of storage nodes then read each created item from another set of storage nodes.

Limitations

There are a set of parameters which are shared and may not be set for a sub-jobs:

  • item-output-file
  • load-job-name
  • load-limit-rate
  • load-limit-time
  • storage-driver-remote

The values for these parameters will be taken from the 1st "config" element from the list (if any).

Configuration

Basic

The chain load job is supported on the scenario engine level. To define a sub job just add the node to the "config" section containing the list of the sub job nodes:

"type" : "chain",
"config" : [
    {
        ...
    }, {
        ...
    }, {
        ...
    }
]

This allows to configure the sub jobs separately. For example, 1st sub job creates the items on the one storage node and the 2nd tries to read them immediately from another storage node:

{
	"type" : "chain",
	"config" : [
		{
			"item" : {
				"output" : {
					"path" : "/bucket1"
				}
			},
			"storage" : {
				"node" : {
					"addrs" : "10.123.45.67"
				}
			}
		},
		{
			"load" : {
				"type" : "read"
			},
			"storage" : {
				"node" : {
					"addrs" : "10.123.45.68"
				}
			}
		}
	]
}

Delay

It's possible to configure a delay for each item to be suspended before being processed by the next sub-job from the chain. The following example performs the writing the items. Each item will be read back not earlier than 1 minute after it is written.

{
	"type" : "chain",
	"config" : [
		{
			"item" : {
				"output" : {
					"delay" : "1m",
					"path" : "/bucket1"
				}
			}
		},
		{
			"load" : {
				"type" : "read"
			}
		}
	]
}

Note

  • Minimum delay is 1 second, 0 value means no delay.
  • In the distributed mode the system clocks should be synchronized precisely.
  • To use the delay feature the configuration parameter "load-metrics-trace-reqTimeStart" should be set to true.

Reporting

TODO

Clone this wiki locally