Skip to content

tanmaysharma2001/sqllite-clone-golang

Repository files navigation

SQLite clone in Golang

Thanks for visiting this repository. This 'SQLite' clone is a step-by-step implementation of the amazing guide 'db_tutorial' by cstack in GoLang.

If you are also a Go Enthusiast then this repo is for you!
Feel free to go through the repo to understand the code.

Project Structure:

Frontend

  1. Tokenizer
  2. Parser
  3. Code Generator
  • Input to the frontend is a SQL query. The output is sqlite virtual machine bytecode (essentially a compiler program that can operate on the database.)

Backend

  1. Virtual Machine
  2. B-tree
  3. Pager
  4. OS Interface
  • Virtual Machine: The virtual machine takes bytecode generated by the front-end as instructions. It can then perform operations on one or more tables or indexes, each of which is stored in a data structure called a B-tree. The VM is essentially a big switch statement on the type of bytecode instruction.
  • B-tree: Each B-tree consists of many nodes. Each node is one page in length. The B-tree can retrieve a page from disk or save it back to disk by issuing commands to the pager.
  • Pager: The pager receives commands to read or write pages of data. It is responsible for reading/writing at appropriate offsets in the database file. It also keeps a cache of recently-accessed pages in memory, and determines when those pages need to be written back to disk.
  • OS Interface:The os interface is the layer that differs depending on which operating system sqlite was compiled for. In this tutorial, I’m not going to support multiple platforms.

Steps:

1. Setting up the REPL:

The Basic Repl (read-execute-print loop) will be started when you start the database, prompting you to write the commands.

2. A Simple SQl Compiler:

  • Front-end of sqlite is a SQL compiler that parses a string and outputs an internal representation called bytecode.
  • The bytecode is passed to the virtual machine, which executes it.

SQL Architecture

  • Introducing META_COMMANDS: the commands which have '.' in starting of them. They are handled in a separate function. like '.exit'.
  • Till now, we have added insert and delete commands. We are handling them but not executing them.

3. An In-Memory, Append-Only, Single-Table Database

For now, it will:

  • Support two operations: inserting a row and printing all rows.

  • reside only in memory (no persistence to disk)

  • support a single, hard-coded table

  • Insert statements looks like:

insert 1 tanmay [email protected]
  • Now we need to copy the data we get from user like above, in some data structure representing the table. SQLite uses a B-tree for fast lookups, inserts and deletes. But for simplicity right now, we will use the array of structs. How?

    1. Store row in blocks of memory called pages.
    2. Each page stores as many rows as it can fit.
    3. Rows are serialized into a Golang struct with each page.
    4. Pages are only allocated as needed.
    5. Keep a fixed-size array of pointers to pages.
  • Rows are serialized in/deserialized from array of Row Structs.

  • The Page Size is kept 4 kilobytes because it's the same size as a page used in the virtual memory systems of most computer architectures. This means one page in our database corresponds to one page used by the operating system. The operating system will move pages in and out of memory as whole units instead of breaking them up.

  • Right now, the arbitrary limit of 100 pages is applied to the table but when we will switch to a tree structure, the database's maximum size will only be limited by the maximum size of a file. (Although we will still limit how many pages we keep in memory at once.)

4. Tests

skipped this one... will add later!

5. Persistence to Disk

  • Like SQLite we are going to persist records by saving the entire database to a file.

  • We are already serializing rows into page-sized structs. To add persistence, we can simply write those blocks of memory to a file, and read them back into memory the next time the program starts up.

  • We will introduce an abstraction Pager which will be responsible for retrieving memory blocks. First it looks in cache. On cache miss, it copies data from disk into memory (by reading the database file).

  • Getting a Page:

    1. Handles Cache Miss
    2. If the requested page lies outside the bounds of the file, we know it should be blank, so we just allocate some memory and return it.
    3. This empty requested page will be added to the file when we flush the cache to disk later.

About

SQL Lite clone in Golang

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages