General Design

SQL Table Notes

Design

Files

Files stored fully with FTS
- FTS on top of actual storage in a separate source table: https://kimsereylam.com/sqlite/2020/03/06/full-text-search-with-sqlite.html, https://www.sqlite.org/fts5.html#external_content_tables
  - Use triggers to automatically update the FTS from the base table: https://sqlite.org/lang_createtrigger.html
- Use YAML metadata for title, tags
- store metadata in the database separately from the body
- Store latest update date to compare with changes on disk
- store mod time as a string of timestamp ... to rely entirely on python datetimes
- store id as int of timestamp ... only second resolution, but YAGNI for greater resolution (for now) ... maybe increase later or if not unique, wait?
References
- Bibliographical citations can be added freely to any note.
- To be more structured ... each source can get it's own note, and possibly tagged as a "source" (bigger topic of what to use tags for). Citing a source is a link to a source note, which can also contain a general summary about the source content if desired.

Links

Stay as close to markdown syntax as possible:
- eg My first link

Here is a zettelkasten link: [My First Link](/20201016154437)

Title of the note is the link title
- This need not match the note's actual title, as it is not static.
- The link title can be just for convenience within the context of the note
- For making new notes from links, through some hotkey, whatever title is provided will have to be used for the actual title of the newly created note.
  - The newly created id will have to be inserted automatically
The URL is a forward slash (/) followed by the unique ID
This should have the following qualities:
- Fit's into ordinary markdown
- Easily detectably distinct from ordinary links
- Expressive

Regex

\[[\w ]*\]\(\/\d{12,14}\)

Or, alternatively, to allow for any characters in the title:

\[.*\]\(\/\d{12,14}\)

If/when creating a new note with such syntax, the title will have to be checked to be only ASCII
- pretty easy in python

Managing new links to potentially new notes

While writing a note, two hurdles might occur
- Inserting a link by the id is difficult as the ID is hard to remember
- Inserting a link to a note you haven't created yet but want to.
For the first problem, the only two alternatives are:
- Parse links with titles only and match the written title against the database. Sounds good and easy, as titles are easier to remember, but in practice it will go wrong frequently I imagine
- Use an automated process (through a text editor for instance) for searching for notes and inserting appropriate links
  - Would be good to ensure that this facility is not bound to any text editor and can be run through a CLI or web API
For the second problem, the two solutions are similar in nature to those above:
- Parse for links that lack an ID (but have a title!), and create those new notes when the note is parsed (perhaps optionally so)
  - I Actually like this. It can be a clean fallback which can be run at all times in the process of parsing notes. Parse for new_links, create new notes, go back and replace new links with actual links (with ids). Kinda a nice to have at this pint though.
- A quick automated process for creating a new note from some easily selectable text (again, inside a text editor but should not be bound to the text editor's API)

Front Matter

Written in YAML (ie, YAML front matter)
Pretty easy to manually parse
Fields:
- title
- tags
- extendable in the future? ... really depends on the parsing

---
title: Python
tags: tech
---

## Content Body

This is a link: [Link](/20201016154437)

Regex

(?s)^---.*---

(?s) Sets the dot matches newlines flag (ordinarily re.S).

Querying

FTS on title
FTS on text
shallow and deep Children of note
shallow parents of note
shallow and deep notes of particular tag
- deep means to include all tags that are children of target
- unsure exactly how to do this with current scheme:
  - http://howto.philippkeller.com/2005/04/24/Tags-Database-schemas/
  - https://stackoverflow.com/questions/20856/recommended-sql-database-design-for-tags-or-tagging
Combine tag queries with rest of above
- This ... could be non-trivial ... especially with the FTS ...
- How combine complex queries with FTS queries? Sub queries and joins?
- How use tags with other queries? may want to rethink the tags model ... depends on performance at scale really
  - instead, could leverage FTS and have a tags column with comma separated tags (including tag paths with slashes: parent_tag/child_tag) (using either FTS or LIKE matching: https://sqlite.org/lang_expr.html#the_like_glob_regexp_and_match_operators)
    - insertion of tag strings could by managed by python, checking the existing tag tables and parent joins, and inserting as appropriate. But then updating and deleting notes and/or tags could get very tricky.
  - Instead ... could flip the process:
    - Keep current model, but add ...
    - first, table of all unique combinations of tags
    - second, Assign to each note a single tag id in a single column in the notes table
    - Adding a tag to a note, adds the unique combination if it does not exist, then add this id to the note
      - But ... how have unique combinations of arbitrary length?

Config

Use python files (why not!)
- Actually, probably better to use INI files as supported by the standard library
Use python import machinery to load an arbitrary file (see code below)
Have up to three locations for a config file:
- Package default file loaded as the fallback
- Home directory
- Current working directory (??)
  - not sure worth it
  - Idea would be to have locally independent working spaces defined by local config files
Options
- zk_path: path of database/filebase (zekellbase)
- alt_paths: support multiple zekellbases (nice to have really!)
  - paths and names
  - currently selected zekellbase for all commands to run against

Example Code

import importlib
import importlib.machinery
import importlib.util

# provide absolute path of config
loader = importlib.machinery.SourceFileLoader(
    'test', '/Users/errollloyd/Developer/zekell/.zekell_config')
spec = importlib.util.spec_from_loader(loader.name, loader)
mod = importlib.util.module_from_spec(spec)
loader.exec_module(mod)
# mod.VARIABLE

Cleaning

To ensure consistency between the file-base and the database, a few batch functions will probably be necessary at some point
- Check all notes in the database correspond to all files in the file-base
- Add all files note in the database
- Remove all files not in the database

Updating Links

To update links:
- driven by notes, and therefore from files (for now).
- do a full refresh of all links from note:
  - delete all with current note as parent
  - add links in current note
- update db.ex() function to allow batch statements so that the initial removal is rolled back should anything go wrong with the insert

Updating Tags

To update note_tags:
- same as for links
To update tags:
- Tricky!
- this is to update the names or even the parent(s) of the tags which are already assigned to a note.
  - For tags not assigned to a note, essentially a simple update
- to change the name of a tag would require updating the text of every note, reliably.
  - update the tag column in tags
  - using regular expressions re.sub(), iterate through every note with specific tag in note_tags, find tag in frontmatter, and replace with new tag.
  - As the ids of the tag and notes stay the same, nothing else needs to be changed
- to change the parent tag of a tag (ie, to move to a new parent)
  - it means whole path will change
  - old parent will stay in tact (removal of that would be another process)
  - create new tag under desired parent
  - as above, go through all necessary notes to change the note path of the original tag text in the frontmatter
- To change any/all component of a tag's path
  - chain the processes above together essentially
  - create new tag as one would create a new tag path
  - iterate through all old notes and substitute old text with new
- The central challenge is relaibly substituting the old text in the frontmatter with the new tag path
  - Engineering the regular expression will be important but doable
  - enabling a note backup and rollback process is probably a good idea.

Deleting Tags

Simple remove, conditional on whether any rows in note_tags containing the tag
- Perhaps a fancy function at some point that will automatically remove such tags from the text of notes

Tasks

Move all creation SQL code to schema.sql
Adjust tests accordingly:
Create files with FTS
Create references table
create note-tags table
create tags hierarchy auto-make trigger
make trigger occur on insert, update and delete
write tests for full_path_tags triggers
Parse files for metadata
- Add to table
- Include update and delete functionality
Add to files/notes table
Sort out SQL commit and close workflow!!
- Just commit on all calls to ex()?
  - Yea ... this ... already done with context manager
- Or, something more sophisticated?
add to note_tags table
Use proper updating code for tags and links
- Add batch statements in db.ex()
- Update links instead of merely adding
- update note_tags instead of merely adding
Enable adding whole note from scratch
Enable configuration for locating database etc
Utility function for batch update of database from notes
- add file to database
- batch add multiple files to database (taking care of linking issues)
- find all files ahead of database (using mod time and os modtime?)
- batch add all files ahead of database
Clean up tag paths (especially in add_tag_path)
- probably best to use pathlib.PurePath as it's got clean interfaces to parents etc

Delete notes
- Remove links

Managing Duplicate notes (created by accident lets say)
- Merge Notes into one
  - Pick one as survivor
  - put contents of deprecated into survivor
  - add tags of deprecated
  - find all links to deprecated and change to survivor (will require overwriting contents of notes!)
  - Delete deprecated

Check for duplicate title when making a new note!
Enable quickly creating an empty note as a link in the current note for when you know a topic/idea deserves a note which will be filled up over time or in the future and so you're ensuring that a link is laid for the current note and that the new note is established for the future.
- Works together with checking for duplicate notes (see above)!

Change note title
- Because of the binding between note files and notes in the database, this may not be trivial
  - update_note() works by updating all fields for the record with the provided note id. So, for this function, relying on a constant note_id is viable, so long as a valid file path is provided with the current version of the note, even with a changed title.
  - The CLI update relies on the note_id to then retrieve the title of the note from which the path of the file is inferred. This is quick and convenient. But, to change the title of a note, a distinct function will be required.
    - takes note_id (fuzzy too)
    - obtains full note_id from database along with older title
    - Searches for note_file with matching note_id
      - If can't be found, error
    - performs update using path of actual note file found in search
  - Update cli_update to search for matching notes (by note_id) when the file_path title has changed
  - When using the package and not the CLI for the sublime plugin, clean the process up by updating directly with the file's current path and name using the update_note() function, checking if the id exists etc.
  - Updating a note, which is done by starting with the file, will hit an error if the title has been changed (even though the id is unchanged).

Delete Tags
- Only if not used in any notes!
add a description column to tags table as a helpful reminder/reinforcer of the meaning/purpose of tags

add to references table
create assets and asset links tables
Look into adding indices for columns likely to be used heavily in queries
- Eg, parent and child note ids in note_links.
- See sqlite docs on query planning and indices
- Whereever UNIQUE constraints are applied an index is effectively created (see docs on create table).
  - So ... only the note_tags table would need an index... or unique constraint on both columns

Add unique constraint or an index on note_tags
- Makes sense to ensure uniqueness here
- Should create an index that should make queries faster
- Perhaps unnecessary
  - Using explain query plan shows lines like SEARCH TABLE note_tags AS c USING AUTOMATIC COVERING INDEX (tag_id=? AND note_id=?), indicating that an index is created automatically already. This is probably through the foreign key constraint that references the id columns of the notes and tags tables, both of which are primary key columns (and so should be indexed)
  - Though see the sqlite docs on foreign keys where it is suggested an index on the child columns is still useful for when certain actions such as delete occur.
Create CLI
- General a CRUD interface on notes and tags + querying
- Add sql command for straight sql
- add custom query command for custom short hand queries
- Add update specific note/file command
- Add command that starts a python REPL and imports the zekell module ready for interactive usage of the module itself
  - Can maybe just run python -i on the module itself?
  - Or maybe some code that also sets up a connection to the database
  - Same for sqlite shell?

Wrap functionality around note staging (mostly CLI)
- Stage note when opened through CLI so that can be readily updated once edited by external app
- List all staged notes
- Batch update all staged notes
  - Check ones have actually been updated?? (too much effort really)
- mainly for editing notes purely through CLI

Can update full tag path triggers to use a single CTE?
write the CTE once, and refer back to it in each trigger !!?
Create query functions/API
- Create a super master query that automatically joins multiple optional queries together
  - Maybe use multiple CTEs for each component, then join them all together at the end on a commonly named note_ids column (??)
  - In combining the components of this query, there will be active and passive filtering
    - Eg: all notes that are children of note X, or all notes that are children of the notes from the previous component ... how to string all of this together? best approach is probably to put together various specific queries that are obviously useful, to get a feel for how it can all be put together
    - Well, passive filtering is done just by joining (kinda easy). Ordinarily, active filtering would involve a WHERE within a subquery or CTE.
- Allow for simple children of and parents of passive components
- Allow for more flexible and powerful text search in title and body queries:
  - Use like and glob on plain columns (not FTS) (see documentation)
    - Would have to decide on how conveyed in the query text and then combined with other queries
      - Probably just require the user to provide 'like' or 'glob' and understand what that means
      - like: % and _ allowed
      - glob: * (instead of %), and other unix symbols? (?, [])?
      - script can scan for these key words, and if present, use them instead of the FTS table.
    - Provide interface to use the bm25() function to rank search matches (see documentation)
      - Maybe have on by default? Use keyword at end of search?
      - Maybe snippet and highlight are useful too? ... to show the reason why a particular hit shows as a match ... provide some context etc.
      - Does this get into fuzzy search territory where a bad match is returned nonetheless?
    - Use (?) the experimental trigram tokeniser (see documentation, released 2020-12: release notes)
      - enables essentially close to fuzzy search that should be performant for any substring of three characters or more with glob/like functionality too!
- Relatedly, Modify the title and body components to always be one CTE
  - At the moment, they are separate and are joined.
  - for some reason, this is a sub-optimal join, performing worse than other queries by orders of magnitude.
  - But, if both where statements are put into a single CTE, performance is excellent again.
  - Have some code that looks for both title and body and constructs a single CTE
  - To combine them requires only a single where statement:
    - where title match "alice" and body match "queen"
    - more fully ... see below ...

with
title_notes(note_id) as (
    select rowid from notes_fts
    where title match "alice" and body match "queen"
)
select z.id,z.title
from title_notes a
inner join notes z on a.note_id = z.id

Allow for OR between components (using outer join rather than inner join)
- sqlite doesn't support full outer joins. They can be hacked together with a union of mirroring left joins ... but would it be worth it?
Allow for complex tag queries
- boolean operations on multiple tags
- Perhaps the toxi scheme is not appropriate for this? It's possible, but perhaps to combersome especially if trying to integrate it with other query components.
- Maybe just using FTS, with a single tags column, with space separated tags and slashes (/) for hierarchy, would work better for querying?
  - Harder to query what are all the tags in the database then ... so maybe both?
  - To remove the slash (/) from token separators see docs on unicode tokenizer options where characters can be added to the set of token and separator characters. With this, whitespace would be used to separate tags and slashes would be retained as part of a single token.
  - Boolean queries would become simpler, using sql booleans on like "%TAG%" statements (see examples in this blog post)
- Including tag children in query (ie, tag software includes software/theory)
  - Probably best as a separate action from direct tag search ... simple recursive?
    - first step is to find all tags that are child of specified tag
    - Then, a simple where tag_id in (...) should do it.

-- All children of tag with parent_id ? 
    with recursive all_tag_children(id, tag, parent_id) as (
        -- Optionally including the initial tag too
        select id, tag, parent_id from tags
        where id = ?
        union
        select id, tag, parent_id from tags
        where parent_id = ?
        union
        select a.id id, a.tag tag, a.parent_id parent_id
        from tags a
        inner join all_tag_children b on a.parent_id = b.id
    ),
    tag_children_notes(note_id) as (
        select note_id from note_tags
        where tag_id in (
            select id from all_tag_children
        )
    )
    -- End of CTE
    -- selection in the course of full query
    select note_id from tag_children_notes

Add follow-through children (and parent) query commands that select all immediate children of the previously selected notes
Same for parent (?)
Add a distinct or group by to eliminate duplicates in children or parents
- from inspections of query plans (with explain query plan), distinct or group by make no difference in the plan. Quick timing tests show no observable difference in the timing.

-- For example

-- Previous selection ... simple title FTS
with title_notes(note_id) as (
    select rowid from notes_fts where title match "alice"
)
,
-- Actual children notes CTE
-- note that both parent and child note ids are required
-- here child_note_id is aliased to note_id as it will continue along in the chain of joins
children_notes(parent_note_id, note_id) as (
    select parent_note_id, child_note_id
    from note_links
)
-- selecting and joining
select z.id
from title_notes a
-- This is different from the usual pattern, where the join MUST be on the parent_note_id column
-- whilst the child_note_id continues along as note_id
-- to implement programmatically will require some logic to alter the usual pattern for
-- these "passive" "follow-through" CTEs
inner join children_notes b
    on a.note_id = b.parent_note_id
inner join notes z
    on b.note_id = z.id

Querying all children (at all levels) of a note (or even set of notes)
- Need a recursive CTE
  - Can limit the number of levels? Could be useful to do so only to 5 levels or so so that the number of notes doesn't explode?
  - How handle cycles?
    - Not sure can limit number of cycles, but only total number of rows added through an ordinar LIMIT statement

-- Note the limit of 100 in final line
-- This is arbitrary, but a good safety limit makes sense (maybe 1000?)
-- Also note that this requires the seed parent_note_id
  -- Could use a like %XXXX fuzzy search for the note_id ?
with recursive all_children(parent_id, note_id) as (
    select parent_note_id, child_note_id from note_links
    where parent_note_id = ?
    union
    select parent_note_id, child_note_id 
        from note_links a
        inner join all_children b
        on b.note_id = a.parent_note_id
        limit 100
)
select note_id from all_children

ensure custom query interface is protected from sql insertion attacks
Run performance tests for various numbers of notes
- 1000, 5000, 10_000, 50_000, 100_000
- Use instance on AWS
- need code for:
  - making dummy notes (like from alice in wonderland)
  - running and timing queries
    - Probably just use zekell CLI with q search machinery.
  - Setting up and tearing down notes
Run performance tests on a subset of wikipedia after scraping (maybe scrape n notes from philosophy) ... maybe just take the introductions of each page only.
- Wikipedia's database is available for download here
- Simple English wikipedia available from here might be a good bet ... whole wiki database is approx ~200MB. This also appears to be XML format, so no need for mysql
Allow for fuzzy search over note ids??
- Idea is to be like git SHAs ... allow using the first 6 digits of a note to search for a ntoe
- 6 digits is pretty easy for short term memory ... essentially Hour:Minute:second, and will most likely be unique or close to
  - 12 hours (working time) * 60 minutes * 60 seconds ~ 40_000
- Probably create another FTS column for the row id?
- Actually, this works already with basic like %TOKEN% syntax !!!:
```
select id from notes where id like "%83006%"
-- returns (20211031083006)
```
Add run_log for all tables for version control and viable sync options
Having running last_mod_time for whole database
- update with triggers or use python? (probably triggers!)
- Could maybe just use the run_log? Probably not efficient!
  - Generally philosophy with this database is make reading fast and writing slow (indices, unique constraints, triggers, etc).
Create a web API interface
- could become primary means of interaction
  - run always in background
  - allow for WAL and multiple connections, which requires another pragma (see documentation)
  - requires either hacking the stdlib (too much) or using bottle (minimal)
Create a frontend (browser based) !!
- Need a markdown system
  - eg, some markdown editors:
    - https://github.com/benweet/stackedit
Change modified time column to float (?)
- Idea is to enable easier sorting in the database?
- but sqlite seems quite happy to sort text. In fact, it seems faster compared to sorting floats, which makes some sense as floating point might introduce some problems. ... Note done, left as text!!
sublime plugin?
- General Query interface:
  - Use plane text input (simple, but prone to errors) ... or ...
  - Use List Inputs in combination with next_inputs
    - Initial selection are available keys (tag, title etc)
    - On selection, another list input (for tag for instance, where there are limited options), or, a general text input.
      - For id, a follow-up list input might make sense, which returns the list of possible options if there are more than one (or even when there is only one, just to confirm)

Ideas

Track web-links separately
- Separate table: ID | title | URL | note_id (foreign key)
- Parse notes for markdown links, taking [this is a link] text as title.
- Maybe ... full text on the title and URL
  - where the URL would probably require a custom tokeniser (separate by slashes and/or dots)
- WHY?
  - Search for notes based on containing a specific URL (stackoverflow, PLOSone, eg)
  - Retrieve all links that are children of certain notes
    - more complex, as have to scan all leaves
    - EG, all links under python.
Create terminal browser/viewer with Rich/Textual
Web server, multi-user, one SQLite db file per user, backedup to S3 with Litestream?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

notes.md

notes.md

General Design

SQL Table Notes

Design

Tags

Full parent derivation

Note Writing

Files

Links

Regex

Managing new links to potentially new notes

Front Matter

Regex

Querying

Config

Example Code

Cleaning

Updating Links

Updating Tags

Deleting Tags

Tasks

Ideas

Files

notes.md

Latest commit

History

notes.md

File metadata and controls

General Design

SQL Table Notes

Design

Tags

Full parent derivation

Note Writing

Files

Links

Regex

Managing new links to potentially new notes

Front Matter

Regex

Querying

Config

Example Code

Cleaning

Updating Links

Updating Tags

Deleting Tags

Tasks

Ideas