Memory limits for computers #1580

SquidDev · 2023-09-03T19:43:28Z

One of ComputerCraft's longest standing issues is that computers can use an unbounded amount of memory. This means that malicious users can allocate large amounts of memory, slowing down or outright crashing servers, in a way that is hard to detect.

I was originally hoping to solve this as part of #769. However, several years later, I think it's looking unlikely that issue is going anywhere - I've done a lot of research and experimental work, and not satisfied with any of the options.

There is some discussion in that issue on how we might impose memory limits in Cobalt, and after some thought, I think it offers a way forward. My initial plan was to track a small number of allocations in the Cobalt VM, and use that to approximate the computer's memory usage (cc-tweaked/Cobalt#66). However, as that issue details, I don't think this approach is as robust as I originally hoped.

Instead, my plan is to go with something closer to what Graal does: periodically compute retained memory, and error if it's over the limit.

There's quite a lot of risk here, and I've definitely not got a concrete plan here¹, so instead I want to do this quite incrementally, in a way that each step adds some value.

Measure allocations of computers: Use ThreadMXBean.getThreadAllocatedBytes to track how much memory a computer allocates², and expose this as a per-computer metric.

While this probably isn't very useful for finding actively malicious computers, I think this is probably useful for passive monitoring of computers (i.e. cc-prometheus). Mostly, I think it's useful to gather some data for what the allocation pattern of "normal" computers looks like.
Measure retained memory of computers: Add a system to compute the reachable/retained memory of a LuaState, compute that after each task (or after every $N$ bytes allocated), and expose this as a metric.

My biggest concern here is performance, and whether we can do a scan of the whole VM in a reasonable time. I'm not aware of any sensible GC-like strategies we can apply here (i.e. incremental or three-colour GCs³), as most of those will work against Java's existing GC. Computers tend not to use much memory, so I'm hoping this should be pretty fast, but I honestly don't know!

I'm not 100% sure how this should interface with CC yet. We need to include the main-thread queue and event queue in this retained memory, but ideally we'd also track all Lua-related objects (especially HTTP handles).
Managed allocations in Cobalt: All (non-trivial) allocations in Cobalt should now be done through some Allocator class, which (for now) just counts how many bytes are allocated. This effectively gives us the same as 1., but will allow us to hook into it in the future.

We might want to expose this as a separate metric, so we can see how this compares to the
allocation rate given by JMX.

I think Lua -> Java calls also need to through this interface. The LuaString to String conversion allocates quite a bit, and we want a scheme which avoids Computers can cause Minecraft to use all available memory when converting objects from Lua to Java MightyPirates/OpenComputers#1774 without having to do string interning.
Hard limits on large allocations: If some code attempts to allocate more than $N$ bytes (probably where $N$ is half the computer's soft limit), throw an "out of memory" error immediately.
Interrupt the VM after lots of allocations: Using the monitoring in 1. and 3., interrupt the VM after allocating $N$ bytes. In this case, we suspend the VM, and compute retained memory.

There are then two cases here:
- If we're above the "hard" memory limit (either defined as the soft limit * 1.5, or soft limit + some constant), terminate the computer immediately.
- If above the soft limit, resume the computer with an error.
Like with timeouts, we probably want some mechanism to force-terminate the computer if it continues to run after a pause has been requested.

I'm not thrilled about resuming the computer with an error - it does mean the error will occur after the allocation, so source positions won't be entirely accurate.

Like I said, definitely not a concrete plan, but hopefully helps us moving towards something working!

Though if you've been paying attention to the news in the UK, you'll know concrete isn't all it's cracked up to be. ↩
Computed both after each task, and in the computer thread monitor. ↩
We can use a two-colour marking system for most objects (aside from strings, as those are shared across VMs), so at least we don't need to generate massive HashSets. ↩

The text was updated successfully, but these errors were encountered:

SquidDev · 2023-09-03T20:23:12Z

Marking as "feedback wanted", as thoughts/ideas/glaring-holes-in-the-plan are appreciated!

SquidDev · 2023-09-08T12:44:26Z

Measure retained memory of computers [...] My biggest concern here is performance, and whether we can do a scan of the whole VM in a reasonable time.

It's definitely not as quick as I'd like, which probably is a good hint to how little I understand low-level performance :D.

A newly booted computer uses about 700KB¹ of memory, which we can scan in 1-2ms.

However, if we create a lot of objects (testing with local a = {} for i = 1, 1e6 do a[i] = {} end, which takes up 36MB), the resulting object graph takes 60ms to scan. This, as one would expect, scales linearly - using 2e6 in the same loop results means the object graph takes 120ms to scan.

This means we definitely we don't want to compute retained memory after every computer resume, but only after every N bytes allocated.

Memory counting should be included in computer execution time, and should probably be made interruptable. There's a risk of malicious program creating a large/complex object graph, and then forcing it to be scanned multiple times, which we need to protect against.
While computers don't really have any scheduling guarantees, introducing 10-100ms delay between two events is not ideal. I don't really have a good proposal for this.

VisualVM is not especially useful here, as most things seem to get inlined into the MemoryCounter loop. We should probably try running some small benchmarks with JITWatch and perf to see what exactly is going on

According to Cobalt. The numbers used to compute used memory are somewhat made up. ↩

SquidDev · 2023-11-09T18:38:57Z

Have implemented the initial allocation tracking in 76968f2. This should at least give us some basic metrics on average memory growth.

lexi-the-cute · 2023-12-08T09:08:57Z

Have implemented the initial allocation tracking in 76968f2. This should at least give us some basic metrics on average memory growth.

thanks! i myself am concerned about malicious users when i as the server owner accidentally crashed my own server messing around with web sockets. used up all of Java Heap and dropped ticks to 1 tick over 60 seconds

SquidDev · 2024-01-04T17:35:04Z

Mostly, I think it's useful to gather some data for what the allocation pattern of "normal" computers looks like.

We've been running the above commit on SwitchCraft for a few weeks now. Here is a plot of the median, 90% quantile, 99% quantile and max allocation rate over the last 3 hours:

As we can see, the majority of computers don't allocate very much at all (the median hovers around 250KiB/s). 90% are sitting at less than 2MiB/s, but the remaining 10% can go all the way up to 100+MiB/s.

Here's also a histogram breakdown, which gives a better idea what the long tail looks like.

This is definitely a larger long-tail than I expected. I was assuming things would cap out at around 8MB/s or so, except in rare cases (reading files). Combined with the performance issues of memory tracing¹, makes me less sure that the method outlined in the OP is viable

Maybe we do need to go the sampling approach after all :/. It should be fairly easy to prototype at least.

See https://github.com/cc-tweaked/CC-Tweaked/issues/1580#issuecomment-1711612378. I've got it about 2x faster since then, but it really needs to be 100x faster, and I don't think that's possible. ↩

RadiantUwU · 2024-03-27T16:03:22Z

Used string.rep('\n',0xFFFFFFF) and filled an array with it till it threw a java OutOfMemoryError

SquidDev added enhancement An extension of a feature or a new feature. area-Core This affects CC's core (the Lua runtime, APIs, computer internals). labels Sep 3, 2023

SquidDev added the feedback wanted Tell me what you want, what you really really want. label Sep 3, 2023

SquidDev mentioned this issue Mar 27, 2024

High resources consumption(?) leading to computer malfunction. Again! #1767

Closed

SquidDev mentioned this issue Oct 20, 2024

[Suggestion] Increase default space limits for computers and floppy disks #1990

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory limits for computers #1580

Memory limits for computers #1580

SquidDev commented Sep 3, 2023 •

edited

Loading

SquidDev commented Sep 3, 2023

SquidDev commented Sep 8, 2023

SquidDev commented Nov 9, 2023

lexi-the-cute commented Dec 8, 2023

SquidDev commented Jan 4, 2024

RadiantUwU commented Mar 27, 2024

Memory limits for computers #1580

Memory limits for computers #1580

Comments

SquidDev commented Sep 3, 2023 • edited Loading

Footnotes

SquidDev commented Sep 3, 2023

SquidDev commented Sep 8, 2023

Footnotes

SquidDev commented Nov 9, 2023

lexi-the-cute commented Dec 8, 2023

SquidDev commented Jan 4, 2024

Footnotes

RadiantUwU commented Mar 27, 2024

SquidDev commented Sep 3, 2023 •

edited

Loading