Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate having api.read being "asynchronous" #117

Closed
miversen33 opened this issue Jan 13, 2023 · 22 comments
Closed

Investigate having api.read being "asynchronous" #117

miversen33 opened this issue Jan 13, 2023 · 22 comments
Assignees
Labels
Core Issues with the Core enhancement/request New feature or request High Priority Needs addressed ASAP

Comments

@miversen33
Copy link
Owner

This will likely need to be done via some sort of coroutine/await, and as not every provider may be able to support it, we will need to flesh out what that looks like. Still, this is a good idea as it should make the life of UI consumers easier

@miversen33 miversen33 added enhancement/request New feature or request Core Issues with the Core labels Jan 13, 2023
@miversen33
Copy link
Owner Author

I can verify that using api.read on a large directory over the internet proves to be... painful without some sort of async process. This is because while we are reaching out to the provider (sometimes repeatedly), the UI will lock up, with no way to "stop" the process without killing the entire Neovim session. This is obviously not desirable and will only be exacerbated when things such as find and rg are exposed over the network connection as well. This should probably be bumped up in priority.

@miversen33 miversen33 self-assigned this Jan 13, 2023
@miversen33 miversen33 added the High Priority Needs addressed ASAP label Jan 13, 2023
@miversen33
Copy link
Owner Author

Probably worth checking out lua-async. It appears to use libuv for async and that would be ideal as that is what we are using for shell processing already

@miversen33
Copy link
Owner Author

Pending some feedback from the luv team on issue #629

Note, the above "async" library is really just using coroutines which is not asynchronous code and won't work for "backgrounding" long running functions. We may consider looking into literal OS threads but I am not fond of that, I imagine luv's async model will work for this. I just need to get a bit of clarification on how to use it properly.

@miversen33
Copy link
Owner Author

miversen33 commented Jan 17, 2023

It looks like, with some simple testing, the below code works "asynchronously" in neovim

local log = require("netman.tools.utils").log
local luv = require("luv") or vim.loop

local _print = print
print = function(...)
    log.debug(...)
    _print(...)
end

local func1 = function(params)
    print(params)
    local max_loop = 1000000000
    local _ = 0
    while _ < max_loop do _ = _ + 1 end
    return 'dun'
end
local func2 = function(params)
    print(params)
    local max_loop = 1000
    local _ = 0
    while _ < max_loop do _ = _ + 1 end
    return 'dun'
end

local func1work = luv.new_work(func1, function(result) print(string.format("func1 complete! %s", result)) end)
local func2work = luv.new_work(func2, function(result) print(string.format("func2 complete! %s", result)) end)
print("Starting background operations")
func1work:queue("hello")
func2work:queue("world")

print("Running long operation in foreground while we wait")
local max_loop = 1000
local _ = 0
luv.run()
while _ < max_loop do
    _ = _ + 1
end
print("Completed foreground operation")

We are going to want to look into using the libuv inbuilt process work pool for our async work. We will start with

  • Abstracting this a bit so I can figure out how to "cancel" work
  • Wrapping find
  • Wrapping read
  • Wrapping write/delete
  • Wrapping everything else in api that talks to the provider for operations

Note, the goal here is to allow the provider to not have to worry about being asynchronous if they don't want to. We will handle that for them. Or rather, they should only worry about being asynchronous if they need to be for their own purposes, not in order to be supported by netman (if that makes sense).

As an example, pushing archives up with the ssh provider will require the code to run asynchronously, but this is due to how we are performing the archive push and not because of how netman itself works.

Inversely, find does not have to be asynchronous, we will manage that ourselves, even though it will likely end up being a long running process.

@miversen33
Copy link
Owner Author

Scrap the above, it appears there is quite a bit of work going on in neovim core to establish what this will look like, so instead of reinventing the wheel, lets just go steal some of the code they are currently using.

@miversen33
Copy link
Owner Author

Scrap the scrapping lol, looking into the neovim stdlib movements on async, everything appears to be centered around coroutines which is not an acceptable solution for this problem. We are going back to the previously mentioned solution

@miversen33
Copy link
Owner Author

Currently on hold pending How do you "kill" a luv_work_ctx_t handle getting resolved and merged, and then however long that takes to end up in Neovim core :(

This wont be making it into v1.1 unfortunately, the wait for the above to happen is too long and 1.1 needs to be finished at some point

@miversen33
Copy link
Owner Author

Welp, it sounds like 1.2 is going to be directly to address setting up async "work" as there is no way to accomplish this via libuv. Lets focus on getting 1.1 completed without async anything, temporarily removing any search mechanics and then look to reorganize 1.2

@akinsho
Copy link

akinsho commented Feb 19, 2023

@miversen33 been keeping my eye on this project as it'd be quite cool to have something like this. Regarding the thread above I'm wondering why specifically a multithreading solution is necessary here and something like using the jobstart or plenary's job's or re-implementing a version of that with luv.spawn isn't enough?

Seems like a lot of plugins which are talking to external processes are able rely on those mechanisms already?

EDIT: to my mind the key issue is to not block the UI right? it's not so important that things actually happen in separate threads AFAICT

@miversen33
Copy link
Owner Author

Hey @akinsho ! Just wanna say that is super cool that you are following this :) I love your plugins lol.

Anyway! I have actually wrapped spawn in netman, and it does provide asynchronous ability :) So its not necessarily the fact that I can't perform actions asynchronously that is the problem

The issue is actually a bit more abstract. I am currently trying to figure out how the api can guarantee asynchronous behavior from the providers it talks to. Given that there is a contract between the API and its providers, I need a way to ensure that all of them are able to function asynchronously.

Of course, I can fix this with ssh and docker (and any future core providers), but I have not yet figured out how to ensure this with 3rd party providers aside from simply requiring the provider write asynchronous compatible code.

Which is an option, just IMO not the preferred option. I would rather the ability to wrap a providers synchronous code and make it async (if the provider doesn't respect the async flag that I am planning on passing them).

I don't know though, I haven't really established how I want this to look. I tend to work through these complex issues in a pretty shitty fashion and then iterate several times over until I have something I like lol. I am certainly open to suggestions on how to address this async compliance though!

To your edit, yes! The main thing is I don't want UIs to be locked up while expensive operations are performing. A great way to illustrate this, I had enabled remote find in a test. I executed that find on the root directory of an ssh connection and looked for anything containing *. It of course locked up all of neovim lol. That is unacceptable behavior that needs to be avoided/circumvented, but since anyone can be a provider, I need a way to prevent that from happening.

@miversen33
Copy link
Owner Author

For reference, my current bad take on this is to have a secondary version of every function that ends with _a (or some identifier like that).

So read/read_a, write/write_a, delete/delete_a, etc.

When I call out to perform an operation with the API, I will try to use the _a by default, but if it doesn't exist I will prompt the user to accept that they will be running that provider synchronously.
Long term of course also enabling a configuration option that can be provided by users only to allow synchronous behavior of a specific provider.

Its... not great, but I feel like if I can put the "onus" on the end user to accept responsibility for a synchronous function locking up their editor, I am ish ok with that. I would prefer not, but this is a fallback option

@akinsho
Copy link

akinsho commented Feb 20, 2023

@miversen33 thanks for explaining in such detail. That makes sense, I think your current solution of putting the responsibility of the user is a fair trade off since as you say there's no way (AFAIK) to asyncify everything if the underlying external implementation is blocking. Personally I think it's much more important that you can move forward with something than are stuck trying to do the perfect thing. Anyway that more than answers my question. Can't wait to be able to use this plugin 🙏🏿

@miversen33
Copy link
Owner Author

#143
Merged some optimizations to how we are running remote find which should vastly improve performance on slower connections, as well as large directories. I am currently working through how to get that setup to work asynchronously, but even without, I imagine this change should help out quite a bit :)

@miversen33
Copy link
Owner Author

Ok so below is the process I have settled on for this.

Any of the APIs interaction functions (basically anything you can call from the API to interact with a URI) will now accept an opts param (an additional last param for any function that didn't have this already). That opts param can now contain a callback key whose value is a function.

That function will be called by the api throughout the async cycle, and the params of it should be data and complete.
This will make its way into doc, but I wanted to float this by @akinsho and @stevearc as they are the ones who raised this concern.

Below is kinda how I expect this function to look

function(data, complete)
    -- Do whatever you want with the data you receive.
    -- API will have the contents of this data object documented for each function that supports async.
    -- Complete will be a boolean telling you if the async stream is finished or not
end

And how I would expect this to be used is as follows

require("netman.api").read("some valid uri", {callback = aforementioned_func})

Now the other kicker here. If you provide a callback, the api will assume you want the operation to happen asynchronously and its return output will change.

Currently, the return output for synchronous calls is

{
    success = boolean,
    error = table_or_nil,
    data = table_or_nil
}

If you provide a callback, the above is going to be streamed to your callback instead. Because of this, api will now return a Shell Handle. Details for this handle can be found here but the gist is that its a table that allows you to "control" the backgrounded async process. This control includes things like writing to its stdin, stopping it, etc.

The data that is streamed to the callback will be the same as the data that is returned synchronously, although explore output will be streamed piece by piece instead of as a whole table.

By default, the api will treat all functions as synchronous to maintain the "current" functionality. Async will only be possible if the following criteria are met

  • The provider has an "async" version of the function available. In practical terms, this means that the provider must implement a read_a function in order to have read calls be asynchronous (for example). I haven't fleshed out all the functions yet, focusing mostly on read (and stat) for the time being.
  • The call params have a callback attribute in the provided opts, and that callback attribute is a function
  • The provider did not previously return an invalid handle on async call. As an example, if the provider does implement read_a but its implementation does not return a table that is compatible with Shell Handle, that provider's function will be black listed until a version update to the provider, or an update to netman occurs, or the user clears them via a as of yet undefined function.

Its a bit of a wall of text but, what do both of you think here? Anything either of you would prefer seeing here or that you would like to see done differently? Questions? Etc.

Tagging: #116 as a related issue

@stevearc
Copy link

I'm on board with most of the ideas mentioned here. Changing the return values based on if it's async or not, seems fine. Having a handle that can affect the processing of the async function is a neat idea. The only thing I would recommend changing is passing the callback in through an opts table. The plenary async library relies on the last argument to a function being the callback. It's not finalized, but it's looking like the core implementation of an async lib will use the same or a similar design. Compatibility with these libraries is going to be important, so I'd recommend conforming to that standard. Another way to look at it is that any async library created is going to have to have support for the vim.loop libuv functions, so if you use the same conventions you're guaranteed to be compatible with whatever they come up with.

@miversen33
Copy link
Owner Author

@stevearc I initially toyed with having the callback be a separate param outside the opts that may be available. It just felt a bit weird as an end user to specify an options table and then a callback after. IMO the callback should be in the opts, and you will see this design philosophy in the deeper sections of Netman (the internals of the providers all do callbacks via the options param).

That said, I do want using this to be as simple as it can be doing async anything. I don't have strong feelings one way or another so if the general consensus is that callback is "the last param" in a function, I can work with that instead, however I do not intend to completely change how the internals of this works. So the API will do async a bit differently than the underlying providers. Which is honestly probably fine given that this is a framework. Providers may do all kinds of wacky things, not following the "standard" of how to do callbacks within netman is certainly not going to be the worst offense lol.

Compatibility with these libraries is going to be important, so I'd recommend conforming to that standard. Another way to look at it is that any async library created is going to have to have support for the vim.loop libuv functions, so if you use the same conventions you're guaranteed to be compatible with whatever they come up with.

I highly agree with this. I will wait a bit to see if the other tagged user has any opinions and I will change up the API a bit for some internal testing. The change is not huge so there shouldn't be too much needed to move to a callback param vs an option.

I appreciate your feedback :) Hopefully I can get async into a stable place and maybe we can work through getting oil setup with netman after that :)

@miversen33
Copy link
Owner Author

miversen33 commented Apr 26, 2023

@stevearc

I have updated the API documentation to hopefully reflect the asynchronous changes I have made. Right now only ssh has async support, and all this is only on the async-1 branch (if you wanted to check it out. Its very much a WIP, there will be bugs).

If I handed you the above documentation and said "here is how you can asynchronously integrate your UI tool with Netman", what questions would you have? I am certain its not perfect, I am sure there are things that aren't super clear. Having spent the better part of the last 2 months buried in this code, I am not really able to see all those issues unfortunately. And since you raised the issue, I'm tagging you to peek at it :)

For the time being, I plan on leaving docker purely synchronous for the following reasons

  1. It gives consumer tools (me writing the Neo-tree code and you/anyone else who wants to poke this) a stable way to compare how async vs sync providers will handle
  2. My goal is for the docker provider to be the "poster child of how to write a provider". And because of this, it needs to be "perfect". So until I have everything fleshed out in the ssh provider, I am going to leave docker alone. Plus it running synchronously isn't nearly as painful as ssh.

A note, I do plan on retrofitting Docker to adhere to the API standards that are laid out in the above doc (it should mostly as of current but I haven't tested it at all so there may be issues with it)

I would greatly appreciate your thoughts and feedback if you have some time :)

@stevearc
Copy link

Left a few comments on the commit

@miversen33
Copy link
Owner Author

I really appreciate you :) Instead of addressing your comments with comments, I will update the doc in those sections to hopefully better relay the answers to your questions. I'll tag you when I have it updated again :)

@miversen33
Copy link
Owner Author

After reading the doc again, I just can't really find a good way to clarify async (and generally "how to use") in that doc. Thus I have broken out the "how to use" and in depth explanations of Netman (from a consumer perspective, which is what oil.nvim, neo-tree.nvim, etc would be considered) into its own document. Said documentation will eventually make its way into the wiki after its been heavily revised. But it can be found currently in the /doc directory. Specifically, (for this issue), here: https://github.com/miversen33/netman.nvim/blob/async-1/doc/consumerguide.md

I have a few more points I need to get on there, and then the entire doc needs a few passes for any missing clarification, readability, etc.

But hopefully that doc provides a substantially better experience in trying to integrate Netman into your plugin :)

@miversen33
Copy link
Owner Author

Implementing Async is mostly complete now (as of this commit). The main core functions (read, write, delete, and get_metadata) now can be used asynchronously or synchronously, and the API is "smart" enough to handle this cleanly.

There are still functions that need to be cleaned/converted (including whatever the fuck this monstrosity is). However, getting the async process abstracted out means that better support for things like copy/move, rename, search, etc is now possible :)

I think the last thing that really needs to be done here is completion of the consumer specific documentation (the above listed consumerguide) so that 3rd parties such as Neo-tree, oil, telescope, etc can properly utilize Netman (including async) without being overwhelmed with the complexity that comes with this.

We are nearing the end of this task :)

@miversen33
Copy link
Owner Author

I have completed the basic documentation for a consumer. I am not going to tag any specific users for the doc review, mainly because thats alot to ask of users who haven't contributed to the codebase.

I am closing this out once I have merged this branch into the current dev branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core Issues with the Core enhancement/request New feature or request High Priority Needs addressed ASAP
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants