-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: ipld amend #445
base: master
Are you sure you want to change the base?
feat: ipld amend #445
Conversation
Based on our discussion, @RangerMauve, @rvagg, and to your specific point, @RangerMauve, I can update I think a good subsequent PR might be one that integrates Alternatively, I could start with a clean slate and port these features over to My aim is to maximize reuse and take advantage of all the work that's already gone into If merging without some or all of these features is a no-go, I don't mind working on adding them to the current PR instead of a subsequent one. |
87b7b41
to
24469a6
Compare
Could you elaborate a bit on what integrating amend into walk and focus would entail?
Question, would it make sense to put all the |
Yes, To integrate I'm also debating whether to just add missing features to
Yes, that makes sense. The main reason I used different terminology and separate packages was to be able to refer to the two implementations individually while comparing them, but, at this point, merging them seems to be appropriate. If I were to merge them, my thinking was to replace the body of Does that seem reasonable? |
Actually, I think I'm going to experiment a bit with the amend/walk/focus integration this week and let you know how it goes. Now that I'm looking at it again, I think it might be less work than I was initially thinking and could potentially fit in the same PR. Integrating it there would allow amend's efficiencies to be available to two different APIs - patch and traversal. |
@smrz2001 can you give me your thoughts on the performance gains from this impl over FocusedTransform? They're essentially having to perform the same operations by the time you get to final node realisation aren't they? What am I not seeing here? Are the transform operations having to double (and triple etc.) up on parent node assembling as they deal with batches of operations? And yeah, now I look at this code, I can see the logic of going back and seeing if you can adapt the walk and focus transforms to use it. That would also help prove the concept. I think that making sure this works across an ADL would be a good step in the process too. The HAMT ADL is actively getting work, Masih has been pushing new changes to it as they're using it for the Filecoin Indexer service I believe, so it's getting closer to feature-complete. Alternatively, the most complete ADL we have is go-unixfsnode. I'm not sure we want those pulled into the repo to test here but if you can figure it out then wiring up a transformation over a working ADL would be really interesting and confirm that the layering is working as expected. Although, it's possible that the pieces around finalisation are a bit too rough to make it all work, so don't spend too much time spinning wheels on that. Also, is amending a non-list or map root node intended to work? Looking at the Any amender, I can't see how that would work. Maybe it doesn't make sense, but shouldn't I be able to amend using a |
(Edited for clarity) Basically, The big advantage in the case of The two most important pieces of logic for this are the While this doesn't matter a ton for a small set of updates, the advantage of
Ok, excellent. I already started looking at this and will make updates in the current PR. I did run into some blockers and might need guidance - will let you know.
Great idea! Coincidentally, being able to efficiently update a HAMT (plus simplifying some of the ugly logic I had to add in I'll start playing around with
Good question. I had the same question during my implementation and was of the opinion that really only recursive
I do see what you're saying though. I could update its As a side thought, I'm wondering if I should store a pointer to the base |
@rvagg, after rereading your question regarding perf, I reworded some of my explanation. Please let me know if that makes sense. To summarize, the main reason behind the new implementation's better performance is that it completely bypasses An |
@rvagg @RangerMauve, I've completed the integration with I'm still mulling over how to integrate with EDIT: I'm actually realizing that it might not be worth integrating with I've also implemented lazy link recomputation so that, if desired, any link(s) whose referred node was modified will be recomputed only when accessed via I'm adding more test cases for focus, patch, and some of the less exposed branches in my code. |
@smrz2001 : is there anything missing before a final review/merge? Can you please also provide updated benchmarks? We'd like to include those in the release notes. 2022-07-19 triage conversation:
|
@BigLep, here are the latest benchmarking results. The numbers are slightly higher than the initial results due to (I believe) the integration with
|
I don't think so, @BigLep, unless @rvagg or @RangerMauve would like to see something else - I'd be happy to make more changes. I haven't gotten a chance to add more tests for corner cases but don't think that's blocking. I'll add some with the next PR to this code. I'm planning to put up another one in a bit anyway for adding and exposing plain If any specific tests are needed for more confidence though, I can add them, no problem. To @rvagg's question about integrating with Once I have a
I tried really hard to keep the packages clean and separate 😭 but had to give up because of circular dependencies while integrating I actually ended up going backwards and pulling all I feel that cc @aschmahmann, since you expressed some interest in this PR. |
That would be cool as heck!
Honestly, I agree with this reasoning. Would it make sense to put Might be useful to talk about this with higher bandwidth at the next IPLD community call. 🤷 |
Didn't work under traversal due to circular dependencies. |
Would it make sense to have a call with @smrz2001 and @rvagg (maybe @aschmahmann) and just devote some time to finalizing what we need to merge this so that it doesn't sit in limbo for too long? |
The concept that this is a "writable ADL" is really curious though. 😅 |
This would be great, @RangerMauve! |
Moving the "writable ADL" discussion here:
Yeah, that thought's been coming to me for a while now, although it's still a bit nebulous. If an ADL is a transformation applied to a "raw" Node contents are read via What this also means is that In fact, I think |
I was just off a call with the WNFS team and we were talking about how ADLs map from raw data into a complex Node, but how it wasn't obvious how the opposite could happen. This made me think that mutability with ADLs is something we should talk about more. e.g. Should we have something in the raw datamodel for transforming a node? The stuff sketched up in #451 seems to allude to a need to be able to modify the keys in a Map or add/remove items in a List (and then get back a new node). This also seems like an alternative to using a Builder node for interfacing with the data in a more high level way. Is this something we want within the repo or should we maybe look at having a higher level interface for IPLD? |
Yes, I think that's fine. We have bindnode in here already and there's methods in the codegen that can mess with internals too (if they're local anyway: e.g. https://github.com/ipfs/go-ipld-git/blob/master/ipldsch_types.go#L61). I think we want to be rigid with making sure that all the various layers and implementations can present both:
As long as those things hold true then the stack should line up. But it doesn't preclude offering alternative interfaces to the same data for usability and/or performance purposes. Performance tends to constraint us a bit and has influenced some of the design decisions here. But I think experimentation with alternative interface mechanisms for the different levels is fine as long as we offer a coherent stack that can work transparently when a user/API doesn't know or care what's underneath. |
OK, I tried pulling this up into a So, if "where does this belong" from a purely API-sense perspective then I'd try and get it into |
So the next problem is: does this work with the layering that traversal needs to? I think the answer is no, unfortunately. If you look at the current implementation of So, I think if you were to run a traversal using this current implementation on anything but a datamodel layer (e.g. basicnode)
In both cases, (and perhaps a third one where you layer one on the other) if you pass one of these to the current transform functions and do a simple transformation, it makes the changes through the Amend is hijacking that and returning an I'm not sure exactly what to do about this though. Some kind of "commit" operation would be nice to add to the traversals but that would have to be implicit, leaving the user with no option to receive an But then there's the question of whether accumulating all of this in memory is even what the user wants? With the current form, I could pass in a huge HAMT with strings in it and use the transformers to append some character to each entry and it'd update the nodes along the way, without needing to buffer it all in memory before doing it at once (one of the nice features of ADLs). So perhaps accumulating should be opt-in too, which means perhaps we should preserve some or all of the exsting traversal infrastructure and allow the user to opt-in to an amending traversal? |
Thanks for experimenting with this, @rvagg. This is where I end up each time I try to move
Agreed, this is what I was trying to allude to in my earlier comment:
Any amendment logic must take into account the underlying I have a few ideas on how to move forward, let me know what you (and @RangerMauve) think:
|
With the approach I outlined above, updating nodes along the way could perhaps be implemented via
|
Yes! Mostly that all sounds good. Some thoughts and questions:
|
From what I can tell, |
It had been a while since I'd looked back at my own code, and now that I have, I've realized that the changes to These changes were primarily meant to discuss and validate the current high-level approach, compared with the previous one where Before that though, if we take a step back, does adding to a
What do you think? I was in the process of adding more |
437510b
to
ba38717
Compare
Hey @rvagg, I just pushed some reworked code that simplifies things considerably. All tests are passing so there shouldn't be any regressions in the code, at least not any major ones. There aren't any new test cases for I'm going to restart the changes to Both I feel like recursive types have always been the source of complexity and inefficiency when attempting to modify nodes. The new code makes it easier to modify the basic map/list types while also improving the DX a little bit, i.e. calling |
ba38717
to
98bab02
Compare
Just pushed a version of I'll work on adding some more tests and benchmarks soon, though would love some feedback regarding the approach if you get a chance to take a look, @rvagg. |
I went ahead and pulled in my map/list interface changes as well here, IMO further improving the DX for the amend API. I feel like now there's probably enough of an API around this to have a fruitful discussion on the path forward, @rvagg. Since the API will very likely evolve some more before it settles, I feel like I should incorporate feedback before adding more unit/benchmarking tests.
|
This is an interesting question. I gather we're talking specifically about this:
It might be reasonable to see a violation when you look at It might fit easier into the |
This is looking pretty clean, I like the way the traversal code feature detects and splits that out; much less scary than having it all tangled up. Could you work on adding some basic tests to the map and/or list amenders to demonstrate usage? I feel like that would be a good path to describing what the purpose of this all is. |
That makes sense, thank you for clarifying.
Yes, since the code looks reasonable, I'll work on adding tests 👍🏼 Thanks! |
Just pushed some tests demonstrating usage for Will also figure out why the other tests are failing. |
f5d1073
to
25825c1
Compare
Ok, just pushed some fixes and tests for It's a little bizarre that all the tests are passing for macOS but not for Windows or Ubuntu for Go 1.19.x/1.20.x, though they pass for Ubuntu on Go 1.18.x. I'll see if I can figure out what's going on - is this something you've run into before? The Go 1.17.x tests on Ubuntu are failing because I've included an experimental package for some list operations that I did not want to code by hand, though I can implement them to maintain compatibility with older Go versions. |
Ok, so the tests were failing because transformations are now changing underlying nodes by default. This is probably a good stage to discuss how to deal with im/mutability when transforming nodes. As I see it, we have a couple of options. A. By default, the code in its current state will now transform nodes being amended, even after they have been "built" and are presumably immutable. This is the most space/performance efficient because it needs no additional metadata regarding modifications. However, this could lead to broken invariants in other parts of the code assuming any constructed nodes to remain unchanged. Maybe that's ok because this only happens if a user explicitly requests an amending builder for a node and knows what to expect. If they need a base node to remain immutable, they'll need to use B. I can reuse some of the logic I previously wrote to maintain additional state for modifications while keeping the base node unmodified. This will definitely increase the complexity of the code and have an impact on space/performance because of the need to store metadata regarding modifications separately from the base node. This approach would have the advantage of ensuring that base nodes are never modified from under code expecting them to remain immutable. This approach would have clearer behavior but would come with significant complexity/space/performance costs. It will, however, fulfill the original promise of copy-on-write. EDIT 1: EDIT 2: |
0629156
to
b22bcc3
Compare
Hey @rvagg, have you had a chance to look at some of the latest updates? There are now examples of how to use the new interfaces. I think we're definitely closer to a clean API though there are still some usability details to iron out, some of which will have an impact on performance. It would be good to understand the most common ways in which this code might be used in order to choose the right approach, or add some configuration options with sane defaults. |
This PR implements a possible solution for incrementally modifying IPLD nodes.
It uses the same interface as @warpfork's patch implementation but accumulates updates and internal state to provide a more optimized
Node
"lens" for copy-on-write duringEncode
.The draft PR included some additional discussions and benchmarking results comparing this PR's implementation with the current
patch
implementation.cc @rvagg @RangerMauve @warpfork @BigLep