Skip to content

Commit

Permalink
added code
Browse files Browse the repository at this point in the history
  • Loading branch information
rtjohnso committed Oct 1, 2015
1 parent 6225ad8 commit 55f41e7
Show file tree
Hide file tree
Showing 10 changed files with 1,890 additions and 1 deletion.
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright (c) 2015, OSCAR Lab
Copyright (c) 2015, Rob Johnson
All rights reserved.

Redistribution and use in source and binary forms, with or without
Expand Down
12 changes: 12 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
CXXFLAGS=-Wall -std=c++11 -g -O3
#CXXFLAGS=-Wall -std=c++11 -g -DDEBUG
CC=g++

test: betree.hpp swap_space.o backing_store.o

swap_space.o: swap_space.cpp swap_space.hpp backing_store.hpp

backing_store.o: backing_store.hpp backing_store.cpp

clean:
$(RM) *.o test
116 changes: 116 additions & 0 deletions README
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
Betree: a small, simple implementation of a B^e-tree, as described in
the September 2015 ;login: article,
"An Introduction to B^e-trees and Write-Optimization"
by Michael A. Bender, Martin Farach-Colton, William Jannen, Rob
Johnson, Bradley C. Kuszmaul, Donald E. Porter, Jun Yuan, and Yang
Zhan

Code by Rob Johnson <[email protected]>

A B^-e-tree is an on-disk data structure with an interface similar to
a B-tree. It stores a mapping from keys to values, supporting
inserts, queries, deletes, updates, and efficient iteration. The key
features of a B^e-tree are extremely I/O-efficient insertions,
updates, and iteration, with query performance comparable to a B-tree.
See the above-referenced article for more details.

This distribution includes
- the B^e-tree implementation
- a test program that checks correctness and demonstrates
how to use the B^e-tree implementation


BUILDING AND RUNNING THE TEST PROGRAM
-------------------------------------

To build, run
$ make
$ mkdir tmpdir
$ ./test -d tmpdir

The test takes about a minute to run and should print "Test PASSED".
The test performs a random sequence of operations on a betree and on
an STL map, verifying that it always gets the same result from each
data structure. If it ever finds a discrepancy, it will abort with an
assertion failure, and will likely leave some files in tmpdir. A
successful run should leave tmpdir empty.

The code has been tested on a Debian 8.2 Linux installation with
- g++ 4.9.2
- GNU make 4.0
- libstdc++ 6.0.20
- libc 2.19
If you have trouble compiling or running the test on other systems,
please submit bug reports to [email protected]. Patches are
definitely appreciated.

GUIDE TO THE CODE
-----------------

test.cpp: The main test program. Demonstrates how to construct and
use a betree.

betree.hpp: The core of the betree implementation. This class handles
flushing messages down the tree, splitting nodes,
performing inserts, queries, etc, and provides an iterator
for scanning key/value pairs in the tree.

The betree is written almost completely as an in-memory
data structure. All I/O is handled transparently by
swap_space.

swap_space.{cpp,hpp}: Swaps objects to/from disk. Maintains a cache
of in-memory objects. When the cache becomes
too large the least-recently-used object is
written to disk and removed from the cache.
Automatically loads the object back into memory
when it is referenced. Garbage collects objects
that are no longer referenced by any other
object. Tracks when objects are modified in
memory so that it knows to write them back to
disk next time they get evicted.

backing_store.{cpp,hpp}: This defines a generic interface used by
swap_space to manage on-disk space. It
supports allocating and deallocating on-disk
space. The file also defines a simple
implementation of the interface that stores
one object per file on disk.


INTERESTING PROJECTS AND TODOS
------------------------------

- Implement logging, transactions, and MVCC. If this can be done in a
way that does not touch the internals of betree, that would be extra
cool.

- Implement range upsert messages (and range deletes). One way to
approach this might be to replace the currently-used std::map for
betree::node::elements with a boost interval map.

- Implement efficient garbage collection of nodes that contain only
keys that are covered by a range delete message.

- Implement "sub-nodes". Sub-nodes are written to disk contiguously
as part of their parent node, but can be deserialized individually.
This would enable the tree to write a node out to contiguous disk
space, enabling fast range queries over the node. Point queries,
however, would be able to deserialize only the sub-node needed to
answer the query, saving disk bandwidth.

- Modify system to track sizes in bytes instead of nodes, keys,
values, etc.

- Add multi-threading support.

- Implement checkpointing and saving/loading of the betree.

- Use boost serialization. The main challenge that I see is that the
deserialization code needs a context for the deserialization, but
boost serialization does not provide this.

- Implement compression and partial eviction (i.e. "evict" a node by
compressing it but keeping it in memory)

- Implement a backing_store that manages space in a single file.
48 changes: 48 additions & 0 deletions backing_store.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#include "backing_store.hpp"
#include <iostream>
#include <ext/stdio_filebuf.h>
#include <unistd.h>
#include <cassert>

/////////////////////////////////////////////////////////////
// Implementation of the one_file_per_object_backing_store //
/////////////////////////////////////////////////////////////
one_file_per_object_backing_store::one_file_per_object_backing_store(std::string rt)
: root(rt),
nextid(1)
{}

uint64_t one_file_per_object_backing_store::allocate(size_t n) {
uint64_t id = nextid++;
std::string filename = root + "/" + std::to_string(id);
std::fstream dummy(filename, std::fstream::out);
dummy.flush();
assert(dummy.good());
return id;
}

void one_file_per_object_backing_store::deallocate(uint64_t id) {
std::string filename = root + "/" + std::to_string(id);
assert(unlink(filename.c_str()) == 0);
}

std::iostream * one_file_per_object_backing_store::get(uint64_t id) {
__gnu_cxx::stdio_filebuf<char> *fb = new __gnu_cxx::stdio_filebuf<char>;
std::string filename = root + "/" + std::to_string(id);
fb->open(filename, std::fstream::in | std::fstream::out);
std::fstream *ios = new std::fstream;
ios->std::ios::rdbuf(fb);
ios->exceptions(std::fstream::badbit | std::fstream::failbit | std::fstream::eofbit);
assert(ios->good());

return ios;
}

void one_file_per_object_backing_store::put(std::iostream *ios)
{
ios->flush();
__gnu_cxx::stdio_filebuf<char> *fb = (__gnu_cxx::stdio_filebuf<char> *)ios->rdbuf();
fsync(fb->fd());
delete ios;
delete fb;
}
32 changes: 32 additions & 0 deletions backing_store.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
// Generic interface to the disk. Used by swap_space to store
// objects.

#ifndef BACKING_STORE_HPP
#define BACKING_STORE_HPP

#include <cstdint>
#include <cstddef>
#include <iostream>

class backing_store {
public:
virtual uint64_t allocate(size_t n) = 0;
virtual void deallocate(uint64_t id) = 0;
virtual std::iostream * get(uint64_t id) = 0;
virtual void put(std::iostream *ios) = 0;
};

class one_file_per_object_backing_store: public backing_store {
public:
one_file_per_object_backing_store(std::string rt);
uint64_t allocate(size_t n);
void deallocate(uint64_t id);
std::iostream * get(uint64_t id);
void put(std::iostream *ios);

private:
std::string root;
uint64_t nextid;
};

#endif // BACKING_STORE_HPP
Loading

0 comments on commit 55f41e7

Please sign in to comment.