Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: how feasible is it to share Freezed objects between goja.Runtimes #232

Closed
mstoykov opened this issue Dec 11, 2020 · 11 comments
Closed
Labels

Comments

@mstoykov
Copy link
Contributor

The question is more or less in the title. The specific use case is an array with objects that mostly will contain strings, numbers, or other array/objects containing the same.

Obviously having a function and calling that function will be bad and having a regex can trigger #214, but are there such situations for the other types, and are there other things that need to be taken into account ... apart from me thinking that this is a bad idea and that w/e answer you give will change over time.

For more context, you can look at this discussion grafana/k6#1739 (comment)

@na--
Copy link

na-- commented Dec 11, 2020

To give you some of the context, without you having to read a whole PR, the situation is like this:

  • a function in a single JS runtime returns an array of things (without any restriction what these things are)
  • we use runtime.ExportTo() to get that array as an []interface{} in Go
  • we give a properly locked mechanism for other goja runtimes to requests pieces of that array or iterate over it, via a Proxy

The question is, to avoid data races, do we need to serialize and unserialize (via JSON or some other mechanism) the data pieces we give to each goja runtime, so they are completely separate copies? Or can we directly return pieces of the original []interface{} slice, if we recursively call Object.freeze() in the Proxy, before we return them to the users?

@dop251
Copy link
Owner

dop251 commented Dec 12, 2020

First of all, freezing a go value wrapped into goja.Object won't work, because it's not possible to store ECMAScript property attributes in the wrapper (as the wrapper is for the entire value, not its properties).

If you have a goroutine-safe go value (e.g. sync.Map), it's possible to share it between runtimes as long as you create an individual wrapper for each runtime, i.e.:

var m sync.Map
runtime1.Set("m", runtime1.ToValue(&m))
suntime2.Set("m", runtime2.ToValue(&m))

If you use a Proxy, then the Proxy itself and the handler also must be created per runtime. If the proxy handler is native it can be shared as long as it's goroutine-safe.

It's important to understand that the wrapped value must be fully recursively goroutine-safe, for example if the sync.Map contains unsafe values (such as regular maps) there will be data races. The recursive safety could in theory be enforced by the proxy handler: when returning property values it could wrap them into another Proxy which will enforce immutability or synchronisation.

Hope this helps.

@mstoykov
Copy link
Contributor Author

I think @na-- confused you somewhat. Or at least I think so ...

Rewording the question somewhat: If we have an object that is only:

  1. string, integers, floats, booleans - "simple" types (I am perfectly sure there is an ECMAScript term for it, that I don't know about)
  2. arrays or "maps"/objects which only have keys and properties of the above variants.

Would calling Object.freeze recursively on those and then giving them to other goja runtimes be ... racy.
I guess as a bonus - would that work if we wrap it in Proxy object for each other runtime, without the original frozen object actually be given to the other runtimes.

The use case is that we want users to be able to load a JSON file (or something of similar data variety) and be able to only have 1 copy in memory (or as close to 1 copy as possible) while being able to access elements of it between multiple Runtimes (read upwards of 100 - usually more).

The current solution (which I like a lot more than this idea, to be honest) is simply to get the original object from the first runtime and serialize it as JSON and then unserialize only the parts we need.

At this point, this is only done for arrays and technically we serialize each key separately and unserialize it into a new object that is then given to each Runtime when each element is requested. This happens through having a Proxy for each Runtime that "emulates" somewhat the way an array behaves (currently having length and support for-of by returning an iterator) and calling a "thread-safe" implementation that just unserializes one element at the time when required.

@dop251
Copy link
Owner

dop251 commented Dec 12, 2020

I think @na-- confused you somewhat. Or at least I think so ...

Rewording the question somewhat: If we have an object that is only:

  1. string, integers, floats, booleans - "simple" types (I am perfectly sure there is an ECMAScript term for it, that I don't know about)
  2. arrays or "maps"/objects which only have keys and properties of the above variants.

Would calling Object.freeze recursively on those and then giving them to other goja runtimes be ... racy.

Yes. Non-primitive goja.Value in general is not goroutine-safe, even for concurrent reads.

I guess as a bonus - would that work if we wrap it in Proxy object for each other runtime, without the original frozen object actually be given to the other runtimes.

No, because the proxy handler will still be accessing the original value.

@na--
Copy link

na-- commented Dec 12, 2020

Each runtime would have its own Value, i.e. we'd call the equivalent of runtime.ToValue(data) for every runtime, but with the same immutable data, and one of the purposes of the freezing would have been to keep data immutable from JS code as well.

But if that's still not a good idea, do you have any suggestions how to handle the underlying problem we're trying to solve? We have some huge static data array (e.g. from parsing a huge 10 MB JSON/CSV/etc. file), and we have hundreds of concurrent goja runtimes. We want to give random access to that data to every runtime, without having a full copy of the huge array in every runtime. Streaming JSON/CSV parsers are not an option for this, each runtime should be able to request any element of the array at any time efficiently. Any suggestions?

@dop251
Copy link
Owner

dop251 commented Dec 12, 2020

What do you mean by 'equivalent of runtime.ToValue(data)'?

@na--
Copy link

na-- commented Dec 12, 2020

I tried to shorten the explanation a bit, to the detriment of quality, sorry...

In reality, say that goData is the huge multi-megabyte []interface{} slice in Go we want to safely share across all runtimes. The idea is that each of the goja runtimes will have a Proxy as a reference to that huge slice, called e.g. jsProxyToTheSharedData. And whenever the JS script contains jsProxyToTheSharedData[i], the Go proxy code will carefully lock the mutex for goData, and return runtime.ToValue(<something>).

The current value for <something> is essentially json.Unmarshal(json.Marshal(goData[i])), i.e. a completely independent but inefficient copy of the original slice element.

And while we can probably optimize the copying (via gob, reflect, etc.), I was wondering if the <something> we return couldn't just be goData[i] (i.e. a direct reference to an element in the huge slice), provided that in the Proxy we recursively call Object.freeze() on it and all of its sub-elements before we return it to the actual script. That is, the result of jsProxyToTheSharedData[i] will be completely frozen before the user can do anything with it.

As a nice UX bonus, it will be very obvious that the result is read-only. Whereas with the current copying solution, the following script will be very strange:

var a = jsProxyToTheSharedData[someIndex]; // e.g. this is {foo: 1}
console.log(a.foo); // prints 1
a.foo++; 
console.log(a.foo); // prints 2
console.log(jsProxyToTheSharedData[someIndex].foo); // prints 1

@dop251
Copy link
Owner

dop251 commented Dec 12, 2020

Then my initial comment stands. You cannot call Object.freeze() on a native object, i.e. the following:

func TestFreezeNative(t *testing.T) {
	vm := New()
	vm.Set("o", map[string]interface{}{})
	_, err := vm.RunString(`
	o.prop = true;
	Object.freeze(o);
	`)
	if err != nil {
		t.Fatal(err)
	}
}

will fail with TypeError: Host object field prop cannot be made read-only.

What you can do is wrap the value into a Proxy that prevents modifications and also wraps any property into a similar Proxy before returning, thus making it recursive. This is currently the only way to make a native object immutable.

But then you have to consider non-ASCII strings. ToValue() will convert them into UTF-16 which means you won't save any memory this way. Even for ASCII strings it will scan them to make sure they are, which means you'll waste some CPU.

Strings currently are immutable so you can share them, but at the moment they can only be created in a Runtime. I could make a global function NewString(s) Value, similar to NewSymbol() if that helps.

@mstoykov
Copy link
Contributor Author

Sorry for the slow reply and the long post, as the saying goes "If I had more time ..."

I actually rewrote the feature to object.Freeze recursively the data before returning, which basically required that I JSON.parse the jsonified data which also lead to some ... other problems/complications.

To be perfectly honest I like the latest iteration more even though it is slightly less performant. The whole idea of this feature, again, is that users want to parameterize their scripts with data that they have dumped in some format (JSON, csv are common) and for that, up until now they needed to load the whole data for each VU or do some hacks as that would've meant that they have a copy of the data multiple times (a few hundred, but maybe thousands).
Example:

function generateArray() {
    return JSON.parse(open("./arr.json"));
}

var s = {};
if (__VU > 0 && __VU < 2) { // We will only load the json for VU 1 
    s = generateArray();
}

exports.default = function () { // this is just what gets called by k6 on each iteration as long as there are iterations to be done
    console.log(s.length, __VU);
}

where "arr.json" has 100k lines of {"user": "myuser", "password": "password12345"} and a total of 4.7M takes around 330-360M compared to if we don't load it taking 230-250M. This is with 1000 VUs(js.Runtimes + other stuff) and no babel+corejs, again only 1 of the VUs actually loads the data, the rest just skip the if ;).

./k6 run --compatibility-mode=base -u 1000 -i 1000  arr_noshare.js

This shows me that the parsed JSON expands from around 4.7M to around 90-100M once parsed and saved in memory. If I load it in 10VUs I get 1.2+GB of usage which seems to confirm the number.
If I rewrite the script to use the feature in the PR in question the memory usage is 310-330MB when all VUs have access to it.
So at this point:

  1. we already have made it possible for this data to be shared between VUs in a read-only way. I would argue nobody will have a problem with memory usage which was previously a big problem in these cases. In the above example a 100 VUs would take 10GB+ instead of the current 330M.( both will actually be more once they start executing and the fact that the GC will have less things to traverse is something that I haven't benchmarked).
  2. The memory usage seems to be higher when we keep it as an object in goja even without freezing it and Proxying it. Not by much and I don't think goja should be optimized for this, but probably by enough especially for more complex objects and/or bigger JSONs.
  3. The performance accessing suffers between 20-30% for just accessing a single element s[1], which IMO is fine given that this happens only once per iteration in most cases. And 100-200x (with the current code, the in golang iterator was faster, but not by much) if we for-of over the whole data. This was based on more realistic data, and I don't have time to rerun it with this one. The iterator performance IMO is not all that important as the use case again is to parameterize some (HTTP) request which will definitely take way more time if all of the data is used.
  4. In my experience the majority of the cases the elements of the data are small (maybe 10-20 keys tops) and as such will have a small memory footprint, that + 2. means that we will likely use less memory for data with thousands of elements. The downside is that we keep creating new objects, but given that again the use case (in general) is that a user will get 1 element per iteration, k6 has way too many other places where optimizing the generation of an object will be way more useful.

So for me, this is a yak shaving at this point for k6.
Having said that if you or someone else tomorrow implements "ReadOnlyWrapper" that you can give either a golang interface{} or a goja.Value and it will act exactly like the original object but be read only and save for concurrent reads between VMs - I am all for using it. This was more or less the original idea, but I quickly realized that with the current goja API I will need a lot of reflection and getting of string properties to implement even something close to that. Which is why I scaled it back ... a lot, but more or less to the thing that IMO is what k6 needs.

Given that I will add some things that would've helped (but unfortunately probably all of them will need to be true in order for this to work better):

  • ProxyTrapConfig question/issue #227
  • JSON.stringify/JSON.parse from golang code without the need to call JS code. The golang json.Encode/Decode has slightly different behavior, which may lead to problems.
  • some ... better way to iterate over an array. I moved to this one as the previous one created 1 more array which likely will be even worse for performance with big arrays, but here I need to iterate in JS code
  • deepFreeze needs JS code wrapper which means that even with ProxyTrapConfig question/issue #227 we will still need to call some JS code to wrap the value on each call to the Proxy.

In a lot of cases what I find developing with goja is that I either need to call some amount of methods to get some goja.Value and then make it into a goja.Object and then get a Property as then repeat ... or I can just call a JS code that will do most of that, probably faster because it will skip a lot of the required transformations. I don't know how to fix that without exploding the API which might not be a great idea and will definitely be a lot harder to maintain. Maybe I should go look at v8 or some other implementation for inspiration 🤷 .
The problem I mostly have with the JS code is that ... well it is JS code in the middle of the go code making it harder to write/read/debug. This might be better solvable on the developer's side by just having the code in a separate file or something, but I digress.

I don't think though that neither the ReadOnlyWrapper nor the above list are as interesting or needed as much as just more ES6 support, as that lets us drop stuff from our babel+corejs combo and get both better performance and less memory usage.

@dop251
Copy link
Owner

dop251 commented Dec 15, 2020

I can assure you that using API is way faster than running JS code:

func BenchmarkGetPropertyAPI(b *testing.B) {
	vm := New()
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		vm.Get("Object").(*Object).Get("freeze")
	}
}

func BenchmarkGetPropertyJS(b *testing.B) {
	vm := New()
	prg := MustCompile("test.js", "Object.freeze", false)
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		vm.RunProgram(prg)
	}
}
BenchmarkGetPropertyAPI-4   	13079428	        94.9 ns/op
BenchmarkGetPropertyJS-4    	 4625163	       257 ns/op

Other than that I take it that you have found a suitable solution so this can be closed?

@mstoykov
Copy link
Contributor Author

yeah @dop251, the solution we had previously was already ... suitable IMO :).

I might try to call things through the API more, but my benchmarking of the code doesn't show it is slow enough for this to matter ... so maybe on a second iteration in the future, or if there is more time. Also, #227 will still be needed to remove most of the code we currently need.

Thanks a lot for the discussion and I am going to close the issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants