Skip to content
This repository has been archived by the owner on Oct 14, 2023. It is now read-only.

Commit

Permalink
Merge pull request #13 from yangyang233333/db_api
Browse files Browse the repository at this point in the history
DB的API相关commit
  • Loading branch information
yangyang233333 authored Jan 29, 2023
2 parents 24b51d1 + eeecacc commit 2fc6749
Show file tree
Hide file tree
Showing 22 changed files with 798 additions and 26 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,7 @@ test_temp

# others
ref
build/*
build/*

storage
db_storage
64 changes: 47 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ smallkv 是一个列存的、基于LSM架构的存储引擎。
**项目正在疯狂迭代中!!**

---

## 进度

- [x] 跳表
Expand All @@ -27,13 +28,18 @@ smallkv 是一个列存的、基于LSM架构的存储引擎。
- [ ] 读流程
- [ ] 写流程
- [ ] Compaction模块
- [ ] 用FreeListAllocate(src/memory/allocate.h)替换系统内存分配器

---

## BUILD

You must use the g++ compiler and Ubuntu 22.04 system.
You must use the g++ compiler(with C++ 17 supported) and Ubuntu 22.04 system.

### build from docker (Highly recommended)

```shell
git clone [email protected]:yangyang233333/smallkv.git
docker pull qianyy2333/smallkv-test
docker run -it -v /{smallkv代码所在的目录}:/test qianyy2333/smallkv-test /bin/bash
./build.sh ## 编译
Expand All @@ -42,6 +48,7 @@ docker run -it -v /{smallkv代码所在的目录}:/test qianyy2333/smallkv-test
```

### build from source code:

```shell
# 安装依赖
apt update && apt upgrade -y && apt install cmake make git g++ gcc -y && cd ~ \
Expand All @@ -50,40 +57,60 @@ apt update && apt upgrade -y && apt install cmake make git g++ gcc -y && cd ~ \
&& git clone https://github.com/nlohmann/json && cd json && mkdir build && cd build && cmake .. && make -j && sudo make install && cd ~ \
&& git clone https://github.com/abseil/abseil-cpp.git && cd abseil-cpp && mkdir build && cd build && cmake .. && make -j && make install && cd ~ \
&& rm -rf spdlog googletest json
git clone [email protected]:yangyang233333/smallkv.git
cd smallkv
./build.sh ## 编译
./main_run.sh ## 主程序
./unittest_run.sh ## 单元测试
```

---

## 设计

### 1. **内存池设计**

![mem_pool](./img/mem_pool_design.png)

### 2. **缓存设计**

![cache](./img/cache_design.png)
Cache中持有N(默认为5)个指向CachePolicy的指针,相当于5个分片,可以减少哈希冲突以及减少锁的范围;LRUCache和LFUCache都是CachePolicy的子类。
Cache中持有N(默认为5)个指向CachePolicy的指针,相当于5个分片,可以减少哈希冲突以及减少锁的范围;LRUCache和LFUCache都是CachePolicy的子类。

### 3. **SSTable设计**

### 3. **SSTable设计**
每个.sst文件存储一个SSTable结构,SSTable结构如下所示:
![sstable_schema](./img/sstable.png)
下面细说每个模块的内容:
- #### 3.1 DataBlock
下面细说每个模块的内容:

- #### 3.1 DataBlock

![data_block_schema](./img/data_block_schema.png)
1)上图中,每个Record存储了具体的KV数据,并且记录了连续的Key的共享长度(为了差值压缩);
2)Restart主要用来进行二分查找,根据Restart中记录的offset信息可以解析出对应的Record Group中最小的Key,通过比对连续的Restart中的Key可以快速定位K-V pair,每个Restart记录了一个Record Group中的Record数量,以及对应的size和offset,每个Restart长度为12字节;
3)Restart_NUM记录了Restart的数量;
4)Restart_Offset记录了Restart的size和offset信息;
- #### 3.2 MetaBlock
MetaBlock中存储了Filter信息(位数组和哈希函数个数),也就是布隆过滤器的数据。为什么需要这个数据?因为sst是顺序append结构,所以写入很快(O(1)),但是查找非常慢(O(N)),于是需要一个布隆过滤器来对请求进行初步的过滤(可以过滤掉一定不存在的KV pair)。
- #### 3.3 IndexBlock
1)上图中,每个Record存储了具体的KV数据,并且记录了连续的Key的共享长度(为了差值压缩);
2)Restart主要用来进行二分查找,根据Restart中记录的offset信息可以解析出对应的Record
Group中最小的Key,通过比对连续的Restart中的Key可以快速定位K-V pair,每个Restart记录了一个Record
Group中的Record数量,以及对应的size和offset,每个Restart长度为12字节;
3)Restart_NUM记录了Restart的数量;
4)Restart_Offset记录了Restart的size和offset信息;

- #### 3.2 MetaBlock

MetaBlock中存储了Filter信息(位数组和哈希函数个数),也就是布隆过滤器的数据。为什么需要这个数据?因为sst是顺序append结构,所以写入很快(O(
1)),但是查找非常慢(O(N)),于是需要一个布隆过滤器来对请求进行初步的过滤(可以过滤掉一定不存在的KV pair)。

- #### 3.3 IndexBlock

![index_block_schema](./img/index_block_schema.png)
IndexBlock存储对应的DataBlock中的最大key信息(注意:实际存储的是shortest_key,并且shortest_key = min{shortest_key > 对应的DataBlock的最大key},这样可以减小比较次数,缓解高并发下的压力);Offset_Info存储了对应DataBlock的size和offset。
- #### 3.4 Footer
IndexBlock存储对应的DataBlock中的最大key信息(注意:实际存储的是shortest_key,并且shortest_key = min{shortest_key >
对应的DataBlock的最大key},这样可以减小比较次数,缓解高并发下的压力);Offset_Info存储了对应DataBlock的size和offset。

- #### 3.4 Footer

![footer_schema](./img/footer_schema.png)
MetaBlock_OffsetInfo记录了MetaBlock的size和offset,IndexBlock_OffsetInfo记录了IndexBlock的offset(第一个IndexBlock的offset)和size(所有IndexBlock的总大小)。

---

## 第三方依赖:

1. [spdlog](https://github.com/gabime/spdlog)
Expand All @@ -92,16 +119,19 @@ MetaBlock_OffsetInfo记录了MetaBlock的size和offset,IndexBlock_OffsetInfo
4. [abseil](https://github.com/abseil/abseil-cpp)

---
## 参考:

## 有用的参考资料:

1. [阿里云NewSQL数据库大赛](https://tianchi.aliyun.com/competition/entrance/531980/introduction)
2. [corekv](https://github.com/hardcore-os/coreKV-CPP)
3. [leveldb](https://github.com/google/leveldb)
4. [LSM树原理](https://zhuanlan.zhihu.com/p/181498475)
5. [LSM Tree是什么?](https://www.zhihu.com/question/446544471/answer/2348883977)
6. [WAL](https://zhuanlan.zhihu.com/p/258091002)
6. [WAL](https://zhuanlan.zhihu.com/p/258091002)
7. [Linux I/O: fsync, fflush, fwrite, mmap](https://juejin.cn/post/7001665675907301412)

---

感谢 [JetBrains](https://jb.gg/OpenSourceSupport) 捐献的免费许可证帮助我们开发smallkv。
Thanks to [JetBrains](https://jb.gg/OpenSourceSupport) for donating product licenses to help develop **smallkv** <a href="https://jb.gg/OpenSourceSupport"><img src="img/jb_beam.svg" width="94" align="center" /></a>
Thanks to [JetBrains](https://jb.gg/OpenSourceSupport) for donating product licenses to help develop **smallkv
** <a href="https://jb.gg/OpenSourceSupport"><img src="img/jb_beam.svg" width="94" align="center" /></a>
Binary file added img/linux_io.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 6 additions & 0 deletions src/cache/cache.h
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,12 @@ namespace smallkv {
return caches[sharding_index]->get(key);
}

// 存在则返回true
bool contains(const K &key) {
uint64_t sharding_index = hash_fn(key) % SHARDING_NUM;
return caches[sharding_index]->contains(key);
}

// 释放节点(引用计数减一)
void release(const K &key) {
uint64_t sharding_index = hash_fn(key) % SHARDING_NUM;
Expand Down
3 changes: 3 additions & 0 deletions src/cache/cache_policy.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ namespace smallkv {
//
virtual Node<K, V> *get(const K &key) = 0;

// 存在则返回true
virtual bool contains(const K &key) = 0;

// 释放节点(引用计数减一)
virtual void release(const K &key) = 0;

Expand Down
6 changes: 6 additions & 0 deletions src/cache/lru.h
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,12 @@ namespace smallkv {
return *(iter->second);
}

// 存在则返回true
bool contains(const K &key) {
ScopedLock<LockType> lock_guard(locker);
return index.find(key) != index.end();
}

// 释放节点(引用计数减一)
void release(const K &key) override {
ScopedLock<LockType> lock_guard(locker);
Expand Down
40 changes: 40 additions & 0 deletions src/db/db.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
//
// Created by qianyy on 2023/1/28.
//
#include "db.h"
#include "db_impl.h"

namespace smallkv {
DB::DB(const Options &options) {
db_impl = std::make_unique<DBImpl>(options);
}

DBStatus DB::Put(const WriteOptions &options,
const std::string_view &key,
const std::string_view &value) {
return db_impl->Put(options, key, value);
}

DBStatus DB::Delete(const WriteOptions &options,
const std::string_view &key) {
return db_impl->Delete(options, key);
}

DBStatus DB::Get(const ReadOptions &options,
const std::string_view &key,
std::string *value) {
return db_impl->Get(options, key, value);
}

DBStatus DB::BatchPut(const WriteOptions &options) {
return db_impl->BatchPut(options);
}

DBStatus DB::BatchDelete(const ReadOptions &options) {
return db_impl->BatchDelete(options);
}

DBStatus DB::Close() {
return db_impl->Close();
}
}
49 changes: 49 additions & 0 deletions src/db/db.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
//
// Created by qianyy on 2023/1/27.
//
#include <memory>
#include <string_view>
#include "status.h"
#include "options.h"

#ifndef SMALLKV_DB_H
#define SMALLKV_DB_H
namespace smallkv {
class DBImpl;

class DB {
public:
explicit DB(const Options& options);

~DB() = default;

// DB 应该是单例,禁止拷贝、赋值
DB(const DB &) = delete;

DB &operator=(const DB &) = delete;

DBStatus Put(const WriteOptions &options,
const std::string_view &key,
const std::string_view &value);

DBStatus Delete(const WriteOptions &options,
const std::string_view &key);

// 将Key对应的值写到value地址上
DBStatus Get(const ReadOptions &options,
const std::string_view &key,
std::string *value);

// 批写
DBStatus BatchPut(const WriteOptions &options);

DBStatus BatchDelete(const ReadOptions &options);

// 关闭数据库:调用此函数可以保证所有已写入数据会被持久化到磁盘,
DBStatus Close();

private:
std::unique_ptr<DBImpl> db_impl;
};
}
#endif //SMALLKV_DB_H
Loading

0 comments on commit 2fc6749

Please sign in to comment.