-
Notifications
You must be signed in to change notification settings - Fork 0
A really simple graphical editor for X11
License
cdo256/cdoedit
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
cdoedit - a really simple graphical text editor Overview -------- cdoedit is a simple text file editor for X for my (cdo256's) personal use. It comes with default keybindings that will be most familiar to those who grew up with Windows. Ctrl+C is copy, Ctrl+V is paste, arrow keys to move the cursor, Ctrl+Arrows to move by word/paragraph, hold down shift when you move and it selects. All the rest of this is in the configuration file, config.h. If you edit this file, make sure to rebuild (sudo make install) to get the changes, since the config file is compiled in to the executable. I haven't tested this on any other platform so this could corrupt files you try to edit, pick up files from the wrong directory or otherwise misbehave so keep a keen eye on it and be sure you back up the original file before you save. Requirements ------------ In order to build cdoedit you need the Xlib header files. Installation ------------ Edit config.mk to match your local setup (cdoedit is installed into the /usr/local namespace by default). Afterwards enter the following command to build and install cdoedit (if necessary as root): make clean install Credits ------- cdoedit is based on Suckless' Simple Terminal (st). st in turn is based on Aurélien APTEL <aurelien dot aptel at gmail dot com> bt source code. Implementation -------------- Files ===== cdoedit.c handles just the grid of glyphys that form the display buffer. editor.c manages the gap buffer and has convenient edit operations. This file is brand new since the st fork. x.c does all the interaction with the xserver. This file is mostly unchanged since the st fork. The Gap Buffer ============== The core of the Document structure in editor.c is a gap buffer. This stores all the characters in the document in a way that's more efficient than a compact array (with no gap) while still maintaining a lot of simplicity. It works by splitting a single buffer into three regions: the left section, the gap, and the right section like so: __left___ ______gap______ _________right__________ / \/ \/ \ [Hello wor]/////////////////[ld! This is a text file.] ^ ^ ^ ^ bufstart gapleft gapright bufend The concrete document is simply gotten by concatenating the left and right buffers. The contents of the gap section is never read (unless one of the gap pointers is first moved). To grow the document around the gap, the text is inserted at gapleft and the gapleft pointer is moved to point to the end of the newly inserted string. This is a simple data structure for sequences of values expected to have most edits clustered close together. Text editors are a perfect example of having edits close together because very often when you're editing a text document, you're either navigating or typing sequentially or editing/deleting for long sequences in a small part of the document. So a gap buffer should be a fast (and relatively small) method of representing an editable text document. Even when doing a find-and- replace (a feature not supported in cdoedit) on something spread out throughout the document works (as long as it's sequential) it only has to move as many characters as the document is long. Gap placement ============= One consideration of this structure is where to put the gap. There are two main options: 1. Have the gap always be where the cursor is. 2. Have the gap be where the last edit was. There are more options for example using some huristic to work out where the next edit point will be but we want to keep things simple. #1 would slow down large navigations but #2 would put the delay on the first edit after a navigation. Because text editors are designed to be used by humans, a large navigation isn't usually immediately followed by an edit because it takes a little time for the user to see where they are in the the document and move the cursor more finely to where they want to type. Whereas when typing starts, it's usually lots of little edits in quick succession and a delay to the user to the first edit could hurt the user experience. Because of these reasons I've opted to have the gap always follow the cursor precisely. One obvious exception here is that jumping to the start or end of the document doesn't really require the user looking at the surrounding context, because they know they just want to quickly add something to the end of the document. So they can Ctrl+End, press enter and immediately start typing. Thankfully for these kinds of cases user input is buffered by the X server so no key strokes are lost. Realloc spike ============= Another big performance consideration is the situation when the buffer needs to grow. This can happen at any time when the user is in the middle of editing some text and may require copying over the entire document. When this happens, the user experiences a delay proportional to the size of the document. This is because, the gap has got too small and the whole buffer must be reallocated and the right section must be moved to the end of the new buffer. This can be mitigated by doing the reallocation during idle time or eleminated entirely by taking advantage of virtual memory. The simplest way to mitigate this is to preemptively grow the buffer during idle time before the buffer is actually needed. This could be done after a keypress in anticipation of the next, before calling select() in the main loop, or after a certain period of inactivity. Eliminating the reallocation spike ================================== The reallocation spike is fundamental to the data-structure and forces us to make a trade off between space and worst-case latency of a key-press. Either we start with a buffer much larger to accomodate most of the expansion and hog much more memory than we need or we keep the buffer tight but take the ux hit of more frequent reallocation (or somewhere between these two extremes). All is not lost however. A solution exists that only involves minimal code changes, but gives us both sides of the trade-off for free with basically no downside. The only problem is that this requires a reallocation method that grows _downward_, which cannot be written using just the POSIX C standard library. The solution works by taking advantage of virtual memory. We know that chunks can be mapped from anywhere in memory and will only be mapped when we ask for them. This gives us the option of making a very large gap but only allocating pages that actually contain data. [MMMMMM] = Mapped page [//////] = Unmapped page Pages: [MMMMMM][MMMMMM][//////][//////][//////][//////]........[//////][//////][//////][MMMMMM][MMMMMM] Data: [This side ]////////////////////////////////////........///////////////////////////[is far away] \___left___/\____________________________(very large) gap_________________________/\___right___/ This not only solves the reallocation spike problem but actually reduces the memory footprint of the document to be at most file_len + page_size * 2 instead of file_len * 2 which could make a difference on memory-constrained systems but won't matter at all in most cases. This leaves us with two related questions: 1. Can we get this to play nicely with malloc? 2. How do we decide size and location in memory? This unfortunately doesn't appear to be currently possible in the current version of Linux (as of December 2019). Since there's no way to reserve the gap pages so that (A) the pages cannot be reused by malloc and (B) these pages don't count toward the total system memory usage. These points could be remedied by simply telling malloc to not allocate on our turf but there doesn't seem to be an easy way to do this that's also robust enough to not break with some internal change to libc's malloc. I've looked for allocation libraries that can do this but they appear to either not exist anywhere or just not exist on GitHub. Point B is a problem because otherwise the kernel will just refuse to allocate the buffer spitting out ENOMEM. Alternatively, a reverse reallocator could be written that would allow for reallocation which would keep the pages at the end of the allocated block instead of at the beginning (as realloc() currently does). This currently also doesn't appear to exist anywhere so it'd have to be a project for a future data. The story on Windows is a lot more optimistic. It's a simple case of calling VirtualAlloc() with MEM_RESERVE but delay committing the memory until it's actually needed. This appears to do exactly what we want with no caveats! One of the few cases where Windows is actually better than Linux. Because of the difficulty in implementing this on Linux, and because the gain is so slight with the kind of files I usually edit, I've decided to leave this as an exercise for the reader. The Update Set ============== Because I found I was repeating a lot of the same code to update various utility pointers in the Document structure, I decided to write a structure of pointers to keep track of them and functions to update the pointers when the document four key pointers (bufstart, gapleft, gapright, and bufend) changes. This can currently happen in one of four situations: 1. The buffer is reallocated to grow the gap. 2. The gap is moved due to navigation (navigation means cursor position change). 3. Some text is inserted into the buffer. 4. Some text is deleted from the buffer. It's worth mentioning at this point using document indexes instead of pointers would allow situations 1 and 2 to not require any additional updates. Specifically these indexes would have to be indexes to left ++ right (ie. the the buffer with the gap removed). This means that when the gap changes size (situation 1) or moves (situation 2), the index will still refer to the same point. The reason why I chose pointers instead of indexes is for simplicity, looking up the character from a pointer address just requires a bounds check. Indexes would likely be similar in most other ways. The solution I employed is a (multi-)set of all the active pointers to the document. Whenever one of the above four situations occurs. All of the pointers to the document are updated. In order for this to work, the following data- structure invariants must be maintained: 1. All pointers in the update set must be valid and point to pointers that point to the document that owns the update set. 2. Whenever one of the four situations occurs, the corresponding dupdateon[a-z]*() functions should be called before any reads/writes to the pointers pointed to by pointers in the update set. The reason why multiset instead of set is to make add and remove symmetric. For example in a normal (not multi) set: ({1,2,3} u {1}) \ {1} = {1,2,3} \ {1} = {2,3} But in a multiset union 'u' and set difference '\': ({1,2,3} u {1}) \ {1} = {1,1,2,3} \ {1} = {1,2,3}
About
A really simple graphical editor for X11