Skip to content

GSoC 2024 ‐ Snehil Shah

Snehil Shah edited this page Nov 6, 2024 · 16 revisions

About me

Hey there! I am Snehil Shah, a computer science undergraduate (as of writing this) at the Indian Institute of Information Technology, Nagpur, India. Apart from my interest in computers and software, I have a dormant passion for audio DSP and synthesis.

Project overview

The read-eval-print loop (REPL) is a fixture of data analysis and numerical computing and provides a critical entry-point for individuals seeking to learn and better understand APIs and their associated behavior. For a library emphasizing numerical and scientific computing, a well-featured REPL becomes an essential tool allowing users to easily visualize and work with data in an interactive environment. The stdlib REPL is a command-line based interactive interpreter environment for Node.js equipped with namespaces and tools for statistical computing and data exploration enabling easy prototyping, testing, debugging, and programming.

This project aimed to implement a suite of enhancements to the stdlib REPL to achieve feature parity with similar environments for scientific computing such as IPython and Julia. These enhancements include:

  • Fuzzy auto-completion
  • Syntax highlighting
  • Visualization tools for tabular data
  • Multi-line editing
  • Paged outputs
  • Bracketed-paste
  • and more...

Project recap

My work on the REPL started before the official coding period began. Before that, I had contributed to some good first issues (implemented some easier packages, C implementations, and refactorings) to get a gist of project conventions and contribution flow. Since then, we have had an array of improvements to the REPL. Let's go through each of them from the beginning:

Completed work

  • Auto-closing brackets/quotations

    My first work on the REPL was implementing auto-closing brackets/quotations, a common feature in IDEs and code editors.

    auto-closing

    • #1680 - feat: add support for auto-closing brackets/quotations in the REPL

    Implementation details:

    • The approach is to walk the abstract syntax tree (generated by acorn) for the current line and detect an auto-closing candidate. If found we just write the corresponding auto-closing symbol to the output.
    • Now there are also cases where the user instinctively types the closing symbol themselves as if this feature never existed, and we should respect that and allow the user to type through the auto-appended symbol.
    • What about auto-deleting the appended symbol? When deleting an opening symbol, check if the corresponding closing symbol follows it. If it does, time to delete it. We need to avoid this behavior, if the user is editing a string literal and we again use acorn to find if the nodes around the cursor are strings.

    As my first PR on the REPL, it wasn't the safest landing, with @kgryte doing most of the heavy lifting. We did get it through the finish line after a month of coding and review cycles, and by this time I had a good grasp of the REPL codebase.

  • Pager

    Earlier, when an output was longer than the terminal height, it would scroll all the way to the end of the output. This meant the user had to scroll all the way back up to start reading the output. The pager aims to capture long outputs and display them in a scrollable way.

    scroll

    • #2162 - feat: add pager to allow scrolling of long outputs in the REPL

    In general, pagers are simply implemented by halting the printing till the terminal height and waiting for user input to print further. But I wanted to do it differently. With our UI, we page in-place, meaning the pager appears like a screen, and we can still see the parent command on the top. The only downside to this might be the possible jittering of output as we rely on re-rendering the page upon every scroll.

    Implementation details:

    1. Before implementing the pager, we made our own custom output stream and piped it to the actual output, so as to avoid messing with the user's readable stream.
    2. The first step is detecting a pageable output. We do this by checking if the number of rows in the output is greater than the height of the terminal stream (including space for the input command).
    3. Then we write the page UI and maintain the page indexing. During paging mode, the entire REPL is frozen and is only receptive to pager controls and SIGINT interrupts.
    4. As we receive the page up/down controls, we update the indices and re-render the page UI.
    5. We also listen to SIGWINCH events to make it receptive to terminal resizes.

    Maintenance work:

    • #2205 - fix: remove SIGWINCH listener upon closing the REPL
    • #2293 - fix: resolve incorrect constraints for scrollable height in the REPL's pager
  • Syntax highlighting

    One of the most requested and crucial additions to the REPL was syntax highlighting.

    typing

    • #2254 - feat: add syntax highlighting in the REPL

      This PR adds the core modules for syntax highlighting, namely the tokenizer, and highlighter.

      syntax

      Implementation details:

      1. With every keypress, we capture the updated line.
      2. We then check if the updated line is changed. This is a short caching mechanism to avoid perf drag during events like moving left/right in the REPL.
      3. Tokenization. To support various token types, a good tokenizer is crucial. We use acorn to parse the line into an abstract syntax tree. During parsing, we keep a record of basic tokens like comments, operators, punctuation, strings, numbers, and regexps. To resolve declarations, we resolve all declarations (functions, classes, variables etc) in the local scope (not yet added to global context) by traversing the AST. To resolve all identifiers, we resolve the scopes in the order local > command > global. To resolve member expressions, we recursively traverse (and compute where needed) the global context to tokenize each member.
      4. Highlight. Each of the token types is then colored accordingly using ANSI escape sequences, and the line is re-rendered with the highlighted line.
    • #2291 - feat: add APIs, commands and tests for syntax-highlighting in the REPL

      A follow-up PR adding REPL prototype methods, in-REPL commands, and test suites for the syntax highlighter. This adds various APIs for theming in the REPL, allowing the user to configure it with their own set of themes. Another small thing I took take care of is to disable highlighting in non-TTY environments.

    • #2341 - feat: add combined styles and inbuilt syntax highlighting themes in the REPL

      This PR adds support for combining ANSI colors and styles to make hybrid colors. So something like italic red bgBrightGreen is supported. This will allow for more expressive theming. It also adds more in-built themes.

    showcase

    P.S. Here's a blog I wrote announcing this.

    Maintenance work:

    • #2284 - fix: resolve bug with unrecognized keywords in the REPL's syntax-highlighter
    • #2290 - fix: resolve clashes between syntax-highlighter & auto-closer in the REPL
    • #2412 - fix: prevent property access if properties couldn't be resolved when syntax highlighting in the REPL
    • #2542 - fix: bug with duplicate tokens when syntax-highlighting in the REPL
    • #2758 - fix: syntax-highlighter not updating if the input has no tokens in the REPL
    • #2770 - fix: handle edge cases in the tokenizer for syntax highlighting in the REPL
  • Multi-line editing

    Prior to this, the REPL did support multi-line inputs using incomplete statements, but no way to edit them. Adding multi-line editing meant adding support for adding lines manually, and the ability to go up and edit like a normal editor.

    • #2347 - feat: add multiline editing in the REPL

      Implementing this is not as easy as it seems. Initially, I thought, just updating the _rli instance and using escape sequences with the updated lines by tracking each up/down keypress event would do the trick. But internally, readline refreshes the stream after operations like left/right/delete etc. This meant, if we were at line 2 and the stream was refreshed, everything below that line was gone. So, to actually implement this, we had to implement manual rendering with each keypress event.

      Implementation details:

      1. We track each keypress event like up/down/right/left, backspace (for continuous deletion), and CTRL+O (for manually adding a new line).
      2. We also maintain line & cursor indices, and highlighted line buffers to store rendering data.
      3. After every keypress event, we visually render the remaining lines below the current line.
      4. Finally, we maintain the final _cmd buffer for final execution.
    • #2531 - feat: allow cycling through multiline commands using up/down in the REPL

      Once multi-line editing was implemented, the ability to cycle through previous commands was broken, as the internal readline implementation handles it line by line. We listen to up/down arrow keys to override the default behavior and use our maintained _history buffer to support cycling through multi-line commands.

  • Unicode table plotter

    A plot API for visualizing tabular data can be leveraged for downstream tasks like TTY rendering in the REPL or even in jupyter environments allowing users to easily work with tabular data when doing data analysis in the REPL (or elsewhere).

    • #2407 - feat: add plot/table/unicode

    The plot API supports data types like Array<Array>, MatrixLike (2D <ndarray>), Array<Object>, and Object. The API is highly configurable giving users full power over how the render looks like instead of giving them a pre-defined set of presets. This is how the default render looks like:

    ┌───────┬──────┬───────┐
    │  col1 │ col2 │  col3 │
    ├───────┼──────┼───────┤
    │    45 │   33 │ hello │
    │ 32.54 │ true │  null │
    └───────┴──────┴───────┘
    

    The plotter also supports the one-of-a-kind wrapping tables, which allows breaking the table into segmented sub-tables when given appropriate maxOutputWidth prop values.

    Implementing this API has been tedious (as evident from the PR footprint) mainly because of the number of properties and signatures it needs, to parse various datatypes and support this level of configurability.

  • Fuzzy completions

    The initial scope was to just implement the fuzzy completions extension, but upon further discussion (and me having a lot of free time), we ended up re-writing an entirely new & improved completer engine from scratch.

    • #2463 - feat: add UX to cycle through completions in the REPL

      The new engine allows highlighted completions, persisted drawer, and the ability to cycle through the completions using arrow keys.

      completer

      Implementation details:

      1. A renderer for the drawer view creates the grid-like view from available completions, with a maintained current completion index highlighted using ANSI escape sequences.
      2. Keypress events like up/down/left/right are tracked for navigation, and with each movement, the drawer is re-rendered and the selected completion is inserted to the current line.
      3. The engine is also receptive to SIGWINCH events meaning the terminal can be resized without any distortion.
    • #2493 - feat: add fuzzy completions extension in the REPL

      Fuzzy matching involves finding approximate completions that the user might be trying to type. This is different than fuzzy search as the length of the final completion shouldn't affect its relevancy.

      fuzzy

      I wrote a fuzzy matching algorithm taking the following variables into account:

      1. Casing mismatch - Low penalty
      2. The first letter doesn't match - Medium penalty
      3. Gap between matching characters in the completion - Medium penalty
      4. Gap before the input was first encountered in the completion - Low penalty (as it already incurred the penalty from the 2nd clause)
      5. Missing character from the input in the completion - High penalty. This is generally not taken into account in most fuzzy completion algorithms but it can help detect spelling mistakes. The only downside being it increases the time complexity of the algorithm as we would still have to traverse the completion string even after a character was found to be missing.

      The fuzzy matching algorithm is based on a penalty-based scoring mechanism that negatively scores the completions for unfavorable characteristics in the completion string (mentioned above) that would make it a less ideal match for the input.

  • Bracketed-paste mode

    Prior to this, pasting multi-line code into the REPL would execute it line-by-line instead of pasting it in a single prompt. Bracketed-paste mode aims to resolve this by allowing to paste without execution.

    • #2502 - feat: add bracketed-paste mode in the REPL

      Bracketed-paste mode is supported by most modern emulators (ref) and can be enabled/disabled via specific escape sequences. Once enabled, the emulator wraps pasted content with certain escape sequences that can then be parsed by downstream applications to detect and handle pasted content separately. We follow this very approach and disable execution when receiving paste sequences.

    Maintenance work:

    • #2516 - test: fix failing tests in the REPL
  • Configuration and persistence

    Prior to this, if the user changed the settings or theme to their preference, those changes were lost once they exited the REPL. This can be annoying as there is no direct way to have a personalized REPL environment except by writing a custom initialization script.

    • #2559 - feat: add support for persisting user preferences using a default configuration in the REPL

      This PR adds support for having a default configuration, and the ability to load custom configuration or profiles.

      Implementation details:

      1. If any changes to REPL preferences are detected when exiting the REPL, the user is prompted whether they would like to save their preferences.
      2. This configuration is saved to the file from which the REPL was initialized from. Meaning the user can have their own configuration files, of the format .stdlib_repl.json in any of the parent directories to the cwd.
      3. There is also a default configuration located at ~/.stdlib/repl.json.
    • #2566 - feat: add fs/resolve-parent-paths

      While implementing support for configuration files, the initial plan was to also allow loading from .stdlib_repl.js files along with stdlib_repl.json. For this, I worked on a file system utility to resolve a path from a set of paths by traversing parent directories. But we later discarded the plan to support .js configurations as it made the idea of saving configurations more complex. So yeah, we didn't end up using this utility after all.

  • Custom key-bindings

    Although readline comes with various actions that can be triggered by certain key combinations, there is no way to configure key-bindings.

    • #2739 - feat: add support for custom keybindings and editor actions in the REPL

    This PR adds support for configuring in-built readline editor actions and also implements some new ones inspired by Julia.

    Implementation details:

    1. To make the readline actions configurable, we just override the selected keybindings to trigger a different sequence that triggers the corresponding action once the keypress is processed by readline. For example, we can set a keybinding like CTRL+M to trigger the backspace action by emitting the required sequence before the CTRL+M keypress is processed.

    2. I also implemented some newer actions which can again be triggered by configurable keybindings.

    3. A challenge I faced was with key combinations involving symbols. (Ex: CTRL+/). readline doesn't parse such combinations into human-enriched formats and hence I had to write a custom parser (parse_key.js) to parse such key combinations.

  • New REPL, New Art

    • #2178 - feat: add a stdlib ASCII art in REPL's default welcome message

    Time for a REPL makeover with some new ASCII art.

    • Before: legacy
    • After: ascii
  • General improvements and bug fixes

    • #2430 - fix: pass options when parsing to suppress warnings in the REPL
    • #2435 - feat: add REPL text lint rule to catch semicolon omission in examples
    • #2597 - build: run workflow step only if user has write access
    • #2635 - fix: resolve bug in string/truncate
    • #2706 - fix: example command hanging in REPL when executing multi-line code
    • #2707 - docs: fix REPL examples in stats/ttest
    • #2736 - docs: fix incorrect package description in string/left-trim-n

Current state

The REPL has seen many changes in these past few months and I can confidently say it's in a strong position now among other interactive environments for scientific computing. We have achieved feature parity on most grounds with popular environments like IPython, Python, and Julia.

What remains

The REPL still is in constant development and one of the areas it is currently lacking in is tests! As the REPL grows, having tests would ensure nothing is breaking and all the moving parts are moving as they should. There are also many more features to be implemented. Here's the ever-growing list of issues tracking this.

Challenges and lessons learned

stdlib has really high coding standards with rigorous style guides and project conventions. Adapting and mastering the art of writing stdlib-standard code has definitely become one of the most valuable learning experiences in my life and career to come. All in all, I'm never writing a function without a docstring ever again :).

Conclusion

Concluding this report, the stdlib-experience is one of a kind and I feel lucky to have been a part of it. I learned a lot and I look forward to learning more aiming to be a core contributor to stdlib one day.

Acknowledgements

I would like to thank @Planeshifter for the insightful 1:1 sessions and for being an amazing mentor, and @kgryte for the countless learning experiences.

Time for some shout-outs: