-
Notifications
You must be signed in to change notification settings - Fork 48
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
10 changed files
with
1,890 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
CXXFLAGS=-Wall -std=c++11 -g -O3 | ||
#CXXFLAGS=-Wall -std=c++11 -g -DDEBUG | ||
CC=g++ | ||
|
||
test: betree.hpp swap_space.o backing_store.o | ||
|
||
swap_space.o: swap_space.cpp swap_space.hpp backing_store.hpp | ||
|
||
backing_store.o: backing_store.hpp backing_store.cpp | ||
|
||
clean: | ||
$(RM) *.o test |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
Betree: a small, simple implementation of a B^e-tree, as described in | ||
the September 2015 ;login: article, | ||
"An Introduction to B^e-trees and Write-Optimization" | ||
by Michael A. Bender, Martin Farach-Colton, William Jannen, Rob | ||
Johnson, Bradley C. Kuszmaul, Donald E. Porter, Jun Yuan, and Yang | ||
Zhan | ||
|
||
Code by Rob Johnson <[email protected]> | ||
|
||
A B^-e-tree is an on-disk data structure with an interface similar to | ||
a B-tree. It stores a mapping from keys to values, supporting | ||
inserts, queries, deletes, updates, and efficient iteration. The key | ||
features of a B^e-tree are extremely I/O-efficient insertions, | ||
updates, and iteration, with query performance comparable to a B-tree. | ||
See the above-referenced article for more details. | ||
|
||
This distribution includes | ||
- the B^e-tree implementation | ||
- a test program that checks correctness and demonstrates | ||
how to use the B^e-tree implementation | ||
|
||
|
||
BUILDING AND RUNNING THE TEST PROGRAM | ||
------------------------------------- | ||
|
||
To build, run | ||
$ make | ||
$ mkdir tmpdir | ||
$ ./test -d tmpdir | ||
|
||
The test takes about a minute to run and should print "Test PASSED". | ||
The test performs a random sequence of operations on a betree and on | ||
an STL map, verifying that it always gets the same result from each | ||
data structure. If it ever finds a discrepancy, it will abort with an | ||
assertion failure, and will likely leave some files in tmpdir. A | ||
successful run should leave tmpdir empty. | ||
|
||
The code has been tested on a Debian 8.2 Linux installation with | ||
- g++ 4.9.2 | ||
- GNU make 4.0 | ||
- libstdc++ 6.0.20 | ||
- libc 2.19 | ||
If you have trouble compiling or running the test on other systems, | ||
please submit bug reports to [email protected]. Patches are | ||
definitely appreciated. | ||
|
||
GUIDE TO THE CODE | ||
----------------- | ||
|
||
test.cpp: The main test program. Demonstrates how to construct and | ||
use a betree. | ||
|
||
betree.hpp: The core of the betree implementation. This class handles | ||
flushing messages down the tree, splitting nodes, | ||
performing inserts, queries, etc, and provides an iterator | ||
for scanning key/value pairs in the tree. | ||
|
||
The betree is written almost completely as an in-memory | ||
data structure. All I/O is handled transparently by | ||
swap_space. | ||
|
||
swap_space.{cpp,hpp}: Swaps objects to/from disk. Maintains a cache | ||
of in-memory objects. When the cache becomes | ||
too large the least-recently-used object is | ||
written to disk and removed from the cache. | ||
Automatically loads the object back into memory | ||
when it is referenced. Garbage collects objects | ||
that are no longer referenced by any other | ||
object. Tracks when objects are modified in | ||
memory so that it knows to write them back to | ||
disk next time they get evicted. | ||
|
||
backing_store.{cpp,hpp}: This defines a generic interface used by | ||
swap_space to manage on-disk space. It | ||
supports allocating and deallocating on-disk | ||
space. The file also defines a simple | ||
implementation of the interface that stores | ||
one object per file on disk. | ||
|
||
|
||
INTERESTING PROJECTS AND TODOS | ||
------------------------------ | ||
|
||
- Implement logging, transactions, and MVCC. If this can be done in a | ||
way that does not touch the internals of betree, that would be extra | ||
cool. | ||
|
||
- Implement range upsert messages (and range deletes). One way to | ||
approach this might be to replace the currently-used std::map for | ||
betree::node::elements with a boost interval map. | ||
|
||
- Implement efficient garbage collection of nodes that contain only | ||
keys that are covered by a range delete message. | ||
|
||
- Implement "sub-nodes". Sub-nodes are written to disk contiguously | ||
as part of their parent node, but can be deserialized individually. | ||
This would enable the tree to write a node out to contiguous disk | ||
space, enabling fast range queries over the node. Point queries, | ||
however, would be able to deserialize only the sub-node needed to | ||
answer the query, saving disk bandwidth. | ||
|
||
- Modify system to track sizes in bytes instead of nodes, keys, | ||
values, etc. | ||
|
||
- Add multi-threading support. | ||
|
||
- Implement checkpointing and saving/loading of the betree. | ||
|
||
- Use boost serialization. The main challenge that I see is that the | ||
deserialization code needs a context for the deserialization, but | ||
boost serialization does not provide this. | ||
|
||
- Implement compression and partial eviction (i.e. "evict" a node by | ||
compressing it but keeping it in memory) | ||
|
||
- Implement a backing_store that manages space in a single file. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
#include "backing_store.hpp" | ||
#include <iostream> | ||
#include <ext/stdio_filebuf.h> | ||
#include <unistd.h> | ||
#include <cassert> | ||
|
||
///////////////////////////////////////////////////////////// | ||
// Implementation of the one_file_per_object_backing_store // | ||
///////////////////////////////////////////////////////////// | ||
one_file_per_object_backing_store::one_file_per_object_backing_store(std::string rt) | ||
: root(rt), | ||
nextid(1) | ||
{} | ||
|
||
uint64_t one_file_per_object_backing_store::allocate(size_t n) { | ||
uint64_t id = nextid++; | ||
std::string filename = root + "/" + std::to_string(id); | ||
std::fstream dummy(filename, std::fstream::out); | ||
dummy.flush(); | ||
assert(dummy.good()); | ||
return id; | ||
} | ||
|
||
void one_file_per_object_backing_store::deallocate(uint64_t id) { | ||
std::string filename = root + "/" + std::to_string(id); | ||
assert(unlink(filename.c_str()) == 0); | ||
} | ||
|
||
std::iostream * one_file_per_object_backing_store::get(uint64_t id) { | ||
__gnu_cxx::stdio_filebuf<char> *fb = new __gnu_cxx::stdio_filebuf<char>; | ||
std::string filename = root + "/" + std::to_string(id); | ||
fb->open(filename, std::fstream::in | std::fstream::out); | ||
std::fstream *ios = new std::fstream; | ||
ios->std::ios::rdbuf(fb); | ||
ios->exceptions(std::fstream::badbit | std::fstream::failbit | std::fstream::eofbit); | ||
assert(ios->good()); | ||
|
||
return ios; | ||
} | ||
|
||
void one_file_per_object_backing_store::put(std::iostream *ios) | ||
{ | ||
ios->flush(); | ||
__gnu_cxx::stdio_filebuf<char> *fb = (__gnu_cxx::stdio_filebuf<char> *)ios->rdbuf(); | ||
fsync(fb->fd()); | ||
delete ios; | ||
delete fb; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
// Generic interface to the disk. Used by swap_space to store | ||
// objects. | ||
|
||
#ifndef BACKING_STORE_HPP | ||
#define BACKING_STORE_HPP | ||
|
||
#include <cstdint> | ||
#include <cstddef> | ||
#include <iostream> | ||
|
||
class backing_store { | ||
public: | ||
virtual uint64_t allocate(size_t n) = 0; | ||
virtual void deallocate(uint64_t id) = 0; | ||
virtual std::iostream * get(uint64_t id) = 0; | ||
virtual void put(std::iostream *ios) = 0; | ||
}; | ||
|
||
class one_file_per_object_backing_store: public backing_store { | ||
public: | ||
one_file_per_object_backing_store(std::string rt); | ||
uint64_t allocate(size_t n); | ||
void deallocate(uint64_t id); | ||
std::iostream * get(uint64_t id); | ||
void put(std::iostream *ios); | ||
|
||
private: | ||
std::string root; | ||
uint64_t nextid; | ||
}; | ||
|
||
#endif // BACKING_STORE_HPP |
Oops, something went wrong.