-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error Handling in Lua #3078
Comments
This one needs a bit of explanation. One of the ways (in standard Lua at least) to debug is use The issue here with NodeMCU is that SDK tasks always start at call stack frame 0 and grow / release the frames until a return from stack frame 0 completes the task returning control to the SDK, and allowing the next callback task to again execute from Lua call-frame zero. In order to make thestack diagnostics available, we would need the interpreter to run at stack frame 6 not 0. This is going to be really difficult to achieve without botching the Lua architecture. This needs quite a bit more though before I could consider doing this. In the meantime we have a lot more valuable low hanging fruit to pick. |
One of the points of fragility in the C module is that |
@nwf , I hold a dissenting view here. The underlying fragility is that people write apps that exhaust memory. If you've run out of memory to the point that the application can't even grow registry (which should be a fairly small table if the application is correctly written), and it is in this exhausted RAM state, then it is truly f**ked; it isn't going to recover. Returning a status for this one API call when every other Lua API call throws the error isn't going to make anything any less fragile. The best thing to do is to throw an error so that error reporting can give a decent traceback. As I asked offline give me the link to such an example and I can give you my feedback. When I develop my apps, I continuously check the registry for growth, e.g. r=debug.getregistry()
for k,v in pairs(r) do if type(v) == "function" then print (k,v) end end
-- this will list off registered functions and this list is growing then you've got a leak
-- With Lua 5.3 builds you can also get debug info like line numbers and upvalues on these
t=tmr.create()
local cnt = 0; local function counter(t) cnt=cnt+1 end; t:alarm(5000,1,counter)
-- if this is in r[5], say, then
for i=1,10 do local u,v=debug.getupvalue(r[5],i);if v then print(u,v) end end
cnt 16
for i=1,10 do local u,v=debug.getupvalue(r[5],i);if v then print(u,v) end end
cnt 19 |
An example is the proposed #2854 PR: the constructor began with a call to As the code was written, a failure of |
@nwf, Thanks Nathaniel. I will do a review of the module and feed in comments. |
@nwf, I've been considering this issue. This "throw on memory error" is the same for all C API calls which can allocate collectable resources such as extra stack, strings, tables or additional table entries and userdata. In general resources created on the stack are cleaned up by the Lua CG after exiting the routine (and using a userdata this way is actually the simplest way of mallocing temporary data). The issue with creating resources in the registry is that these will persist if the routine throws an error for whatever reason, in that registry entries will persist even if an error is thrown, leading to memory leakage of dead resources. Something that I propose to clean up in #1028, but you and I should agree some best practice patterns for coding this. But another peeve of mine if that a lot of our code tests for error values and error logic on return even though these call throw an error and will never return an error value. |
|
One of the complications here is that we need a unified error architecture across both the Lua 5.1 and 5.3 VMs so that the rules are the same for modules and independent of which Lua version used. I've had this challenge in other functional areas as well such as table and function handling. A consequence of this is that for from Lua 5.1 being in "frozen" support, it is receiving a constant "bleed-in" of Lua 5.3 goodies. |
@nwf, @HHHartmann, I've set up I can include all of these changes as a single batch PR or alternatively just change one module on this PR (e.g. |
Well this seems to working fine. No more undiagnosed tracebacks in CBs and optionally being able to turn off panic reboot. > file.putcontents('f.lua', 'function f() g() end')
> dofile'f.lua'
> node.setonerror(print)
> tmr.create():alarm(1000,0,f)
> f.lua:1: attempt to call global 'g' (a nil value)
stack traceback:
f.lua:1: in function <f.lua:1>
> node.setonerror()
> tmr.create():alarm(1000,0,f)
> f.lua:1: attempt to call global 'g' (a nil value)
stack traceback:
f.lua:1: in function <f.lua:1>
ets Jan 8 2013,rst cause:2, boot mode:(3,6)
load 0x40100000, len 31864, room 16
... |
@TerryE: You may as well land the whole bunch in a single PR and we can go from there. I think we are currently obligated to not merge anything to |
https://github.com/nodemcu/nodemcu-firmware#releases |
@marcelstoer we are on the same page. A target date would be nice. |
@nwf wrote:
I will formulate the PR on this basis. Also since I am making changes to lauxlib and the modules to add the |
Adding the |
Already done 😄 I will tidy up a couple of the modules using luaL_reref and luaL_unref2 as worked examples but let's leave the bulk until after you have had a review of these and are happy. |
@nwf, IIRC you are working on |
That's fine; I can take those. :) |
@nwf One note whilst I think on. We don't want to convert all |
@nwf I have come across an issue in my testing and that is when the error is triggered memory exhaustion, for example: > a={}; function f() for i=1,3000 do a[i]=0 end end
> f()
E:M 32784
E:M 32784
Lua error: not enough memory
> print(#a) a=nil -- still got a in memory
2048
> collectgarbage(); print(node.heap())
40784
> tmr:create():alarm(500,0,f) -- now run out mem in a CB
> E:M 32784
E:M 32784
> -- this should have done the TB and printed it, then rebooted but it didn't
> a= nil; collectgarbage(); return node.heap()
40760 The issue is that the error handler is running at the deepest stack frame and I think that I need to run the GC before I try to create the closure for the post and do the reboot. From the manual:
It looks like we just need to call the onerror directly after the call. |
OK, that works. > a={} collectgarbage(); print(node.heap())
40752
> tmr:create():alarm(500,0,f)
> E:M 32784
E:M 32784
out of memory
ets Jan 8 2013,rst cause:2, boot mode:(3,6)
... It's a feature of Lua that event handlers are not called on "out of memory" conditions, but at least PS: the above diagnostics were with Lua 5.3, but its the same with Lua 5.1. |
BTW, the above comments had absolutely nothing to do with this issue so I have moved them onto a separate one: #3101 |
hi when will node.setonerror() be available in custom build online...? |
@chathurangawijetunge There's no better answer than "when it's ready". @TerryE is moving heaven and earth with his changes to the Lua core of NodeMCU and we're all eagerly awaiting his work landing on |
@chathurangawijetunge, I've got this all working on my working branch, but we want to do the next cut to master before adding this PR. |
PR has been merged. On to next functional tranche 😄 |
#3075 introduced the new
luaL_callx()
to facilitate improved Panic Handling. I want to use this issue to discuss error handling in more general terms and why the focus on panic handling.pcall()
andxpcall()
, thus creating a protected environment for executing a code hierachy. In such cases the error is returned to the application and the application code itself determines how to process or log the error.One of the main issues that catches new developers is that whilst the interactive thread established a default protected environment to catch and print errors interactively, any Lua callback runs as separate execution thread and therefore is not protected; any errors here will panic and reboot the processor. The issue that I want to consider here is whether this is the correct behaviour for NodeMCU and if not then how we can improve this in a way which doesn't break existing applications.
We have three broad mechanisms for reporting errors in the firmware:
print()
function. Prior to SDK 3.0 releases, we always did (1) but this meant that errors generated duringnode.output()
where reported only on the UART.print()
now sends output to theSTDOUT
which is in turn emptied to the UART or thenode.output()
reader using the Pipe module, and this enables error reporting in a terminal session to occur through the session.In case (3) the default panic process should still print the error, but at least wait until the STDOUT pipe is empty or a maximum time elapsed before restarting the ESP. We should also provide a
node.setonerror()
call to enable the application to override this default (say to add some form of error logging).Scope of Work
So I suggest that a good next PR would be to tidy up such error handling:
node.restart()
functionality to wait until STDOUT or N sec elapsed before restarting ESP. Add extradelay
boolean parameter to enable this mode and document this. Q: Should this wait until errors flushed be thedefault?node.setonerror()
call and document.luaL_pcallx()
Modifydebug.debug()
function to read from STDIN pipe, and output toprint()
(Maybe a separate PR.)The text was updated successfully, but these errors were encountered: