Skip to content

Protocol

Dustin Sallings edited this page Jul 11, 2013 · 24 revisions

cbfs protocol

Plain HTTP

cbfs is a fairly straightforward read/write HTTP server. You PUT a document into a path and you can later GET it, or perhaps DELETE it when you're done.

Headers you specify in your PUT request will be returned on subsequent GET and HEAD requests.

A special X-CBFS-Hash header specifying the SHA1 of the content (or other hash if configured) will be verified on upload allowing end-to-end integrity.

Also, conditional get via Last-Modified or ETag (preferred) is supported, as well as range requests.

Each HEAD or GET request for an object will return an ETag with the blob identifier.

Many common features of HTTP work out of the box, such as:

  • Conditional GET
  • Conditional PUT
  • Range GETs
  • gzip transfer encoding

Special HTTP Options

GETing a blob from an origin server

If your client has knowledge of the full cluster and you want to ensure that your request is not proxied, you can include the header X-CBFS-LocalOnly. Any node receiving this request that doesn't contain the blob locally will return an HTTP status 300 with a Location header suggesting one origin server and a body containing a list of all origin servers.

Example (slighly mangled) response:

HTTP/1.1 300 Multiple Choices
Content-Type: application/json; charset=utf-8
Location: http://192.168.1.38:8484/.cbfs/blob/[hash]
X-Cbfs-Oldestrev: 0
X-Cbfs-Revno: 0
Content-Length: 241
Date: Thu, 11 Jul 2013 17:52:44 GMT

["http://192.168.1.38:8484/.cbfs/blob/[hash]",
 "http://192.168.1.135:8484/.cbfs/blob/[hash]",
 "http://192.168.1.107:8484/.cbfs/blob/[hash]"]

PUTting Unsafely

By default, any PUT to a cbfs cluster with more than one node will synchronously write to two nodes and verify the result on both before returning success. This can be inefficient, especially when storing large amounts of data.

To avoid this and only synchronously store to one node, you can send your PUT request with the X-CBFS-Unsafe header.

Auto-Expiring Data

Any given PUT can include the X-CBFS-Expiration header which does memcached-style expiration. Specifically (from the memcached protocol documentation):

...the actual value sent may either be Unix time (number of seconds since January 1, 1970, as a 32-bit value), or a number of seconds starting from current time. In the latter case, this number of seconds may not exceed 606024*30 (number of seconds in 30 days); if the number sent by a client is larger than that, the server will consider it to be real Unix time value rather than an offset from current time.

Special Paths/Operations

Any path that starts with /.cbfs/ is reserved.

Raw Blob Access

Raw blobs are all available under the /.cbfs/blob/ root.

List All Blobs on a Node

GET /.cbfs/blob/

Retrieve a Specific Blob

e.g. to retrieve blob 91a0333bd9d92691f9a52ca206403a9f11fa9ce2:

GET /.cbfs/blob/91a0333bd9d92691f9a52ca206403a9f11fa9ce2

Retrieve Information About a Blob

If you want all the metadata associated with one or more blobs, you can POST to /.cbfs/blob/info/ with one or more blob parameters.

The response is a json object containing all the nodes known to contain that object. The timestamps should be ignored by most applications, but represent the most recent verification time of the object at each node.

{
    "42da2aebe633eb8697b5abf43de57eea1c53d113": {
        "nodes": {
            "node1": "2013-07-09T00:23:41.681671816Z",
            "node2": "2013-07-06T08:35:12.973780765Z",
            "node3": "2013-07-05T16:01:46.25480482Z"
        }
    }
}

Store a Specific Blob with Known Hash

Internally, we push blobs to proactively distribute using URLs of this form:

PUT /.cbfs/blob/91a0333bd9d92691f9a52ca206403a9f11fa9ce2

This will register the blob only iff the content it receives hashes to the same input value. This value will be returned in the X-CBFS-Hash response header.

Successful status code: 201

Store a Specific Blob without a Known Hash

POST /.cbfs/blob/

We use this form when streaming content from a client into multiple servers concurrently. In this case, we don't know the hash until we're done. The hash of the data written will be returned in the X-Hash header and it will be registered.

Successful status code: 201

Delete a Specific Blob

DELETE /.cbfs/blob/91a0333bd9d92691f9a52ca206403a9f11fa9ce2

Successful status code: 204

Request a Server to Fetch a Blob

If you want a server to find and retrieve a blob on its own, hit a URL in the following form.

GET /.cbfs/fetch/91a0333bd9d92691f9a52ca206403a9f11fa9ce2

Succesful status code: 204


Document Meta Information

Every stored file can have arbitrary user-specified metadata associated with it. This is a JSON object attachment that is integrated in our couchbase file meta document.

Example usage: user can store a pdf and have an extract with some meta properties for a search engine integration.

Storing File Meta

To store a JSON document for the file /original/file/path, PUT the JSON content to /.cbfs/meta/original/file/path

PUT /.cbfs/meta/original/file/path

Successful status code: 201

Retrieving File Meta

GET /.cbfs/meta/original/file/path

User Data in Views

If you're digging around in the couchbase instance holding file meta, you can create a view over all of the objects of type file and find your file meta under doc.userdata. This field will not be present if the document has no user data specified.


Listing Files

While cbfs doesn't model directories properly, we infer containers in the listing interface by splitting the files by the / character. To get a list of the files and directories at a given location, Use the /.cbfs/list/ path. E.g.:

Listing the root:

curl /.cbfs/list/

Example result:

{
    "dirs": {
        "dist": {
            "descendants": 12,
            "largest": 1547090.0,
            "size": 11020885.0,
            "smallest": 323
        },
        "monitor": {
            "descendants": 6,
            "largest": 115812,
            "size": 216193,
            "smallest": 15
        },
        "prague": {
            "descendants": 88891,
            "largest": 331999168.0,
            "size": 2132902038.0,
            "smallest": 0
        }
    },
    "files": {},
    "path": "/"
}

Listing a subdirectory:

curl /.cbfs/list/some/sub/directory

Options

You can use the includeMeta=true option to get details in the returned files. Directory meta information is always included.

Also, you can specify the depth (default is 1) parameter to ask how deep we should look for content. You will want to be careful with this, as it can be taxing on the server to produce a very large result set. It's primarily useful if you have a small toplevel with a few small subdirectories and you want save a few round trips.

Retrieving a Collection of Files

You can get a zipfile containing all the contents under a specific path using the /.cbfs/zip/ handler or the /.cbfs/tar/ handler:

curl /.cbfs/zip/some/path > somepath.zip

File Versioning

Multiple versions of a given file may be stored (and retrieved).

Every update to a file increments the files revno which is sent in the X-Cbfs-Revno response header in GET and HEAD requests. Also included is the X-Cbfs-Oldestrev header which indicates how far back the version history is for a given file.

Deleting a file destroys all of its history.

Storing a New Version

By default, PUT destroys all old versions. If you want to preserve older versions, you can specify the X-CBFS-KeepRevs header with the number of revisions you want to retain. A value of -1 means all revisions and a value of 0 means only the most recent.

Example (storing all revisions):

curl -D - -XPUT -H 'X-CBFS-KeepRevs: -1' \
    --data-binary @awesomefile http://localhost:8484/awesomefile

Retrieving a Previous Version

Doing a GET with a rev=3 parameter gets revision 3 if it's available. If this revision is not available, an HTTP 410 status code is returned.

Note that retrieving an older revision returns all headers as they were during that revision, including the last modified date, Etag, etc...

Example:

curl http://localhost:8484/awesomefile\?rev=3

View Proxy

cbfs will proxy views back to a couchbase server if -viewProxy is specified on the commandline.

To access a view within the couchbase server behind cbfs, GET /.cbfs/viewproxy/[original view path] e.g.:

curl 'http://localhost:8484/.cbfs/viewproxy/cbfs/_design/cbfs/_view/repcounts?group_level=1'

CRUD Proxy

cbfs will proxy primitive CRUD operations to the couchbase bucket if -crudProxy is specified on the commandline.

All CRUD operations are available under the path /.cbfs/crudproxy/[key], and use the HTTP verbs GET, PUT and DELETE for their designated behaviors.

Caveats:

  • Mime types are not processed sensibly.
  • JSON is not validated.
  • GET returns a 404 on any failure.
  • DELETE returns a 500 on any failure (even for missing documents)

Getting a List of Nodes

GET /.cbfs/nodes/

Example result:

{
    "anode": {
        "addr": "192.168.1.86:8484",
        "addr_raw": "192.168.1.86",
        "bindaddr": ":8484",
        "free": 718941134848,
        "hbage_ms": 2002,
        "hbage_str": "2.002752359s",
        "hbtime": "2012-10-11T06:41:40.908816Z",
        "size": 262998369666,
        "starttime": "2012-10-10T23:33:33.859991Z",
        "uptime_ms": 489051,
        "uptime_str": "8m9.05157742s",
        "used": 262998369666
    },
    "cathode": {
        "addr": "192.168.1.135:8484",
        "addr_raw": "192.168.1.135",
        "bindaddr": ":8484",
        "free": 199036534784,
        "hbage_ms": 4261,
        "hbage_str": "4.261298091s",
        "hbtime": "2012-10-11T06:41:38.650247Z",
        "size": 259967341043,
        "starttime": "2012-10-10T23:33:37.245379Z",
        "uptime_ms": 485666,
        "uptime_str": "8m5.666166167s",
        "used": 259967341043
    }
}

Getting the Server Configuration

GET /.cbfs/config/

Setting the Server Config

PUT /.cbfs/config/

Config should look similar to what you received from the GET above.

fsck

To validate all metadata linkage, you can use the fsck operation. It does the following:

  1. Walks the file list (or a portion of it you specify)
  2. Verifies it can look up all the files it finds
  3. Verify it can find blob info for all the files it finds
  4. Reports details of its findings

Example of a full, detailed fsck across the entire filesystem:

GET /.cbfs/fsck/

Or just across your backups:

GET /.cbfs/fsck/mybackups

Or that, but only show me errors:

GET /.cbfs/fsck/mybackups?errorsonly=true

The result in each case will be a series of JSON objects, one per file (one per line). Each object will have a path field describing the path to the file and possibly an oid field describing the object ID of the version of that file when the scan occurred.

If an error is reported, there'll be an etype and error field indicating what occurred.

Note that this does not include verification that the objects themselves are OK. This will happen automatically on each node daily, and the blobs will become unregistered (leading to underreplication which should be repaired up to the point where a blob is entirely lost).

tasks

You can see the tasks being performed by various nodes on a cluster by accessing the following:

GET /.cbfs/tasks/

Some tasks are global (can only be run on one machine in the cluster at a time) and some are local (can run independently on multiple nodes concurrently). The documentation for what tasks run is a bit out of scope for this document, but the listing should be reasonably obvious.

The first level of the object is the name of the node. Inside this object is another object that is keyed off of each task that's running. Each one of these contains the state of the task and the time the task went into that state.

Valid states are running and preparing. A task is preparing when it is blocked by some other task (or it just hasn't gone into running state yet). In the example below, you can see that ensureMinReplCount is in a preparing state on dogbowl because trimFullNodes is in running state on bsd2 and these two tasks have a mutual exclusion rule.

{
    "bruce": {
        "reconcile": {
            "state": "running",
            "ts": "2012-10-11T06:23:39.206318Z"
        }
    },
    "bsd2": {
        "trimFullNodes": {
            "state": "running",
            "ts": "2012-10-11T06:42:25.945552Z"
        }
    },
    "dogbowl": {
        "ensureMinReplCount": {
            "state": "preparing",
            "ts": "2012-10-11T06:44:21.181458Z"
        }
    }
}

Performance Data

The path /.cbfs/debug/ contains various performance data for each node.

The most interesting is probably the io section which contains the w_B and r_b histograms. These are specific write and read events over the HTTP front-end respectively. The sum is all of the bytes written or read while the count is then number of times the write or read occurred. The histogram shows the sizes of these calls -- often translating to a write() or read() syscall, but also utilizing sendfile() when transferring origin data to a client. Large write sizes are the result of sendfile() calls. Otherwise, the histogram shows how much data is being moved in these calls.

The tasks section shows how long (and how many times) various internal tasks are running in milliseconds.