Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add linear hashing implementation. #30

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,60 @@ Dependencies
```
apt install bison flex
```

# Linear Hashing

You can build and run the test suite with the following commands:
- `cd build`
- `cmake -DCMAKE_BUILD_TYPE=Debug ..`
- `make check`
- `ctest -R linear_hashing_test -V`

Files:
- `src/include/utils/linear_hashing.h`
- `src/utils/linear_hashing.cc`
- `test/unit/utils/linear_hashing_test.cc`

## Architecture
I chose to implement tombstones for the deletion logic without replacing the tombstoned entries. In a real implementation, this would be acceptable for a hashtable with a high base number of entries since the capacity would slowly shrink. As the hashtable implementation focus is on minimalism, I decided not to implement the logic for cleaning up the tombstones.

## HashTable
The hashtable has a backing vector of Entries. Entries contain a key, value, tombstone boolean, and full boolean. The tombstone boolean is flagged when the key is deleted and the full boolean is flagged when the spot in the vector is filled by an insert.

## Lookup
The lookup algorithm returns the value associated with the key. If the key is not found, it returns -1.

## Insert
The key value pair is inserted into the backing table.
- If another entry with the same key is already in the table, it is replaced with the new value.
- If a tombstone with the same key is in the table, the tombstone entry is revived with the new value.
- If the table is full, no entry is inserted.

## Erase
The entry associated with the key is flagged as a tombstone and the value at the location is returned. If the key does not exist in the table, the function returns -1.

# Tests
I included 15 tests which cover almost all of the possibilities that the table could encounter.

General:
- InitTable

For insert:
- InsertOne
- InsertFive
- InsertFull
- InsertOverflow
- InsertReplace
- InsertFiveCollision
- InsertThreeOneCollision

For erase:
- EraseOne
- EraseAll
- EraseThreeOneCollision
- EraseKeyNotFound

For lookup:
- LookupOne
- LookupEraseThreeOneCollision
- LookupKeyNotFound
58 changes: 58 additions & 0 deletions src/include/utils/linear_hashing.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
#pragma once

#include <cstdint>
#include <memory>
#include <vector>
#include <iostream>

namespace buzzdb{
namespace utils{

class LinearHashTable {
public:
struct Entry {
int key;
int val;
bool tombstone;
bool full;

Entry(int key, int val): key(key), val(val), tombstone(false), full(true) {};
Entry(int key, int val, bool ts, bool full): key(key), val(val), tombstone(ts), full(full) {};
~Entry() {};
void set_tombstone(bool ts) { this->tombstone = ts; };
void set_full(bool full) { this->full = full; };
bool operator==(const Entry& e) const {
return key == e.key && val == e.val && tombstone == e.tombstone && full == e.full;
};
friend std::ostream& operator<<(std::ostream& os, const Entry& e) {
os << "Entry(" << e.key << "," << e.val << "," << e.tombstone << "," << e.full << ")";
return os;
};
};

LinearHashTable(size_t capacity) {
this->capacity = capacity;
this->sz = 0;

for (size_t i = 0; i < capacity; i++) {
table.push_back(Entry(0, 0, false, false));
}
};
~LinearHashTable() {};
void insert(int, int);
int erase(int);
int lookup(int);
size_t size();
std::vector<Entry> get_backing_vector() { return table; };

private:
size_t capacity;
size_t sz;
std::vector<Entry> table;

size_t hash(int);
Entry *lookup_entry(int);
};

} // namespace utils
} // namespace buzzdb
71 changes: 71 additions & 0 deletions src/utils/linear_hashing.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@

#include <algorithm>
#include <fstream>
#include <iostream>
#include <utils/linear_hashing.h>

namespace buzzdb {
namespace utils {

#define UNUSED(p) ((void)(p))

size_t LinearHashTable::hash(int key) {
return key % capacity;
}

size_t LinearHashTable::size() {
return sz;
}

LinearHashTable::Entry *LinearHashTable::lookup_entry(int key) {
size_t index = hash(key);
Entry *e = &table[index];

for (size_t i = 0; i < capacity; i++) {
e = &table[(index + i) % capacity];

if (!e->full) return nullptr;
if (e->key == key) {
if (e->tombstone) return nullptr;
else break;
}
}
if (e->key != key) return nullptr; // iterated through vector

return e;
}

void LinearHashTable::insert(int key, int val) {
size_t index = hash(key);
Entry *e = &table[index];;

for (size_t i = 0; i < capacity; i++) {
e = &table[(index + i) % capacity];

if (!e->full || e->key == key) {
break;
}
}
if (e->full && e->key != key) return; // vector full

if (!e->full || e->tombstone) sz++;
*e = Entry(key, val);
}

int LinearHashTable::erase(int key) {
Entry *e = lookup_entry(key);

if (e == nullptr) return -1;
e->tombstone = true;
sz--;

return e->val;
}

int LinearHashTable::lookup(int key) {
Entry *e = lookup_entry(key);
return e == nullptr ? -1 : e->val;
}

} // namespace utils
} // namespace buzzdb
Loading