-
Notifications
You must be signed in to change notification settings - Fork 40
Protocol
cbfs is a fairly straightforward read/write HTTP server. You PUT
a
document into a path and you can later GET
it, or perhaps DELETE
it when you're done.
Headers you specify in your PUT
request will be returned on
subsequent GET
and HEAD
requests.
A special X-CBFS-Hash
header specifying the SHA1 of the content (or
other hash if configured) will be verified on upload allowing
end-to-end integrity.
Also, conditional get via Last-Modified or ETag (preferred) is supported, as well as range requests.
Each HEAD
or GET
request for an object will return an ETag
with
the blob identifier.
Many common features of HTTP work out of the box, such as:
- Conditional GET
- Conditional PUT
- Range GETs
- gzip transfer encoding
If your client has knowledge of the full cluster and you want to
ensure that your request is not proxied, you can include the header
X-CBFS-LocalOnly
. Any node receiving this request that doesn't
contain the blob locally will return an HTTP status 300
with a
Location
header suggesting one origin server and a body containing a
list of all origin servers.
Example (slighly mangled) response:
HTTP/1.1 300 Multiple Choices
Content-Type: application/json; charset=utf-8
Location: http://192.168.1.38:8484/.cbfs/blob/[hash]
X-Cbfs-Oldestrev: 0
X-Cbfs-Revno: 0
Content-Length: 241
Date: Thu, 11 Jul 2013 17:52:44 GMT
["http://192.168.1.38:8484/.cbfs/blob/[hash]",
"http://192.168.1.135:8484/.cbfs/blob/[hash]",
"http://192.168.1.107:8484/.cbfs/blob/[hash]"]
By default, any PUT
to a cbfs cluster with more than one node will
synchronously write to two nodes and verify the result on both before
returning success. This can be inefficient, especially when storing
large amounts of data.
To avoid this and only synchronously store to one node, you can send
your PUT
request with the X-CBFS-Unsafe
header.
Any given PUT
can include the X-CBFS-Expiration
header which does
memcached-style expiration. Specifically (from the memcached protocol
documentation):
...the actual value sent may either be Unix time (number of seconds since January 1, 1970, as a 32-bit value), or a number of seconds starting from current time. In the latter case, this number of seconds may not exceed 606024*30 (number of seconds in 30 days); if the number sent by a client is larger than that, the server will consider it to be real Unix time value rather than an offset from current time.
Any path that starts with /.cbfs/
is reserved.
Raw blobs are all available under the /.cbfs/blob/
root.
GET /.cbfs/blob/
e.g. to retrieve blob 91a0333bd9d92691f9a52ca206403a9f11fa9ce2
:
GET /.cbfs/blob/91a0333bd9d92691f9a52ca206403a9f11fa9ce2
If you want all the metadata associated with one or more blobs, you
can POST
to /.cbfs/blob/info/
with one or more blob
parameters.
The response is a json object containing all the nodes known to contain that object. The timestamps should be ignored by most applications, but represent the most recent verification time of the object at each node.
{
"42da2aebe633eb8697b5abf43de57eea1c53d113": {
"nodes": {
"node1": "2013-07-09T00:23:41.681671816Z",
"node2": "2013-07-06T08:35:12.973780765Z",
"node3": "2013-07-05T16:01:46.25480482Z"
}
}
}
Internally, we push blobs to proactively distribute using URLs of this form:
PUT /.cbfs/blob/91a0333bd9d92691f9a52ca206403a9f11fa9ce2
This will register the blob only iff the content it receives hashes to
the same input value. This value will be returned in the
X-CBFS-Hash
response header.
Successful status code: 201
POST /.cbfs/blob/
We use this form when streaming content from a client into multiple
servers concurrently. In this case, we don't know the hash until
we're done. The hash of the data written will be returned in the
X-Hash
header and it will be registered.
Successful status code: 201
DELETE /.cbfs/blob/91a0333bd9d92691f9a52ca206403a9f11fa9ce2
Successful status code: 204
If you want a server to find and retrieve a blob on its own, hit a URL in the following form.
GET /.cbfs/fetch/91a0333bd9d92691f9a52ca206403a9f11fa9ce2
Succesful status code: 204
Every stored file can have arbitrary user-specified metadata associated with it. This is a JSON object attachment that is integrated in our couchbase file meta document.
Example usage: user can store a pdf and have an extract with some meta properties for a search engine integration.
To store a JSON document for the file /original/file/path
, PUT
the
JSON content to /.cbfs/meta/original/file/path
PUT /.cbfs/meta/original/file/path
Successful status code: 201
GET /.cbfs/meta/original/file/path
If you're digging around in the couchbase instance holding file meta,
you can create a view over all of the objects of type file
and find
your file meta under doc.userdata
. This field will not be present
if the document has no user data specified.
While cbfs doesn't model directories properly, we infer containers in
the listing interface by splitting the files by the /
character. To
get a list of the files and directories at a given location, Use the
/.cbfs/list/
path. E.g.:
Listing the root:
curl /.cbfs/list/
Example result:
{
"dirs": {
"dist": {
"descendants": 12,
"largest": 1547090.0,
"size": 11020885.0,
"smallest": 323
},
"monitor": {
"descendants": 6,
"largest": 115812,
"size": 216193,
"smallest": 15
},
"prague": {
"descendants": 88891,
"largest": 331999168.0,
"size": 2132902038.0,
"smallest": 0
}
},
"files": {},
"path": "/"
}
Listing a subdirectory:
curl /.cbfs/list/some/sub/directory
You can use the includeMeta=true
option to get details in the returned
files. Directory meta information is always included.
Also, you can specify the depth
(default is 1
) parameter to ask
how deep we should look for content. You will want to be careful with
this, as it can be taxing on the server to produce a very large result
set. It's primarily useful if you have a small toplevel with a few
small subdirectories and you want save a few round trips.
You can get a zipfile containing all the contents under a specific
path using the /.cbfs/zip/
handler or the /.cbfs/tar/
handler:
curl /.cbfs/zip/some/path > somepath.zip
Multiple versions of a given file may be stored (and retrieved).
Every update to a file increments the files revno
which is sent in
the X-Cbfs-Revno
response header in GET
and HEAD
requests. Also
included is the X-Cbfs-Oldestrev
header which indicates how far back
the version history is for a given file.
Deleting a file destroys all of its history.
By default, PUT
destroys all old versions. If you want to preserve
older versions, you can specify the X-CBFS-KeepRevs
header with the
number of revisions you want to retain. A value of -1
means all
revisions and a value of 0
means only the most recent.
Example (storing all revisions):
curl -D - -XPUT -H 'X-CBFS-KeepRevs: -1' \
--data-binary @awesomefile http://localhost:8484/awesomefile
Doing a GET
with a rev=3
parameter gets revision 3
if it's
available. If this revision is not available, an HTTP 410
status
code is returned.
Note that retrieving an older revision returns all headers as they were during that revision, including the last modified date, Etag, etc...
Example:
curl http://localhost:8484/awesomefile\?rev=3
cbfs will proxy views back to a couchbase server if -viewProxy
is
specified on the commandline.
To access a view within the couchbase server behind cbfs, GET
/.cbfs/viewproxy/[original view path]
e.g.:
curl 'http://localhost:8484/.cbfs/viewproxy/cbfs/_design/cbfs/_view/repcounts?group_level=1'
cbfs will proxy primitive CRUD operations to the couchbase bucket if
-crudProxy
is specified on the commandline.
All CRUD operations are available under the path
/.cbfs/crudproxy/[key]
, and use the HTTP verbs GET
, PUT
and
DELETE
for their designated behaviors.
Caveats:
- Mime types are not processed sensibly.
- JSON is not validated.
- GET returns a 404 on any failure.
- DELETE returns a 500 on any failure (even for missing documents)
GET /.cbfs/nodes/
Example result:
{
"anode": {
"addr": "192.168.1.86:8484",
"addr_raw": "192.168.1.86",
"bindaddr": ":8484",
"free": 718941134848,
"hbage_ms": 2002,
"hbage_str": "2.002752359s",
"hbtime": "2012-10-11T06:41:40.908816Z",
"size": 262998369666,
"starttime": "2012-10-10T23:33:33.859991Z",
"uptime_ms": 489051,
"uptime_str": "8m9.05157742s",
"used": 262998369666
},
"cathode": {
"addr": "192.168.1.135:8484",
"addr_raw": "192.168.1.135",
"bindaddr": ":8484",
"free": 199036534784,
"hbage_ms": 4261,
"hbage_str": "4.261298091s",
"hbtime": "2012-10-11T06:41:38.650247Z",
"size": 259967341043,
"starttime": "2012-10-10T23:33:37.245379Z",
"uptime_ms": 485666,
"uptime_str": "8m5.666166167s",
"used": 259967341043
}
}
GET /.cbfs/config/
PUT /.cbfs/config/
Config should look similar to what you received from the GET
above.
To validate all metadata linkage, you can use the fsck
operation.
It does the following:
- Walks the file list (or a portion of it you specify)
- Verifies it can look up all the files it finds
- Verify it can find blob info for all the files it finds
- Reports details of its findings
Example of a full, detailed fsck across the entire filesystem:
GET /.cbfs/fsck/
Or just across your backups:
GET /.cbfs/fsck/mybackups
Or that, but only show me errors:
GET /.cbfs/fsck/mybackups?errorsonly=true
The result in each case will be a series of JSON objects, one per
file (one per line). Each object will have a path
field describing
the path to the file and possibly an oid
field describing the object
ID of the version of that file when the scan occurred.
If an error is reported, there'll be an etype
and error
field
indicating what occurred.
Note that this does not include verification that the objects themselves are OK. This will happen automatically on each node daily, and the blobs will become unregistered (leading to underreplication which should be repaired up to the point where a blob is entirely lost).
You can see the tasks being performed by various nodes on a cluster by accessing the following:
GET /.cbfs/tasks/
Some tasks are global (can only be run on one machine in the cluster at a time) and some are local (can run independently on multiple nodes concurrently). The documentation for what tasks run is a bit out of scope for this document, but the listing should be reasonably obvious.
The first level of the object is the name of the node. Inside this object is another object that is keyed off of each task that's running. Each one of these contains the state of the task and the time the task went into that state.
Valid states are running
and preparing
. A task is preparing
when it is blocked by some other task (or it just hasn't gone into
running
state yet). In the example below, you can see that
ensureMinReplCount
is in a preparing
state on dogbowl
because
trimFullNodes
is in running
state on bsd2
and these two tasks
have a mutual exclusion rule.
{
"bruce": {
"reconcile": {
"state": "running",
"ts": "2012-10-11T06:23:39.206318Z"
}
},
"bsd2": {
"trimFullNodes": {
"state": "running",
"ts": "2012-10-11T06:42:25.945552Z"
}
},
"dogbowl": {
"ensureMinReplCount": {
"state": "preparing",
"ts": "2012-10-11T06:44:21.181458Z"
}
}
}
The path /.cbfs/debug/
contains various performance data for each
node.
The most interesting is probably the io
section which contains the
w_B
and r_b
histograms. These are specific write and read events
over the HTTP front-end respectively. The sum
is all of the bytes
written or read while the count
is then number of times the write or
read occurred. The histogram shows the sizes of these calls -- often
translating to a write()
or read()
syscall, but also utilizing
sendfile()
when transferring origin data to a client. Large write
sizes are the result of sendfile()
calls. Otherwise, the histogram
shows how much data is being moved in these calls.
The tasks
section shows how long (and how many times) various
internal tasks are running in milliseconds.