Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client hangs after /system reboot command #18

Open
FezzFest opened this issue Oct 1, 2015 · 14 comments
Open

Client hangs after /system reboot command #18

FezzFest opened this issue Oct 1, 2015 · 14 comments

Comments

@FezzFest
Copy link

FezzFest commented Oct 1, 2015

After executing a /system reboot command, the client hangs and does not return.
Example:

    $client = new RouterOS\Client($ip, 'admin', 'password', null, false, 10);
    $request = new RouterOS\Request('/system reboot');
    $client->sendSync($request);
    echo 'OK';

In the above example, the echo statement is never reached and 'OK' is never printed to the screen.
The same thing happens with the 'set-and-forget' method (using asynchronous calls).

    $client = new RouterOS\Client($ip, 'admin', 'password', null, false, 10);
    $request = new RouterOS\Request('/system reboot');
    $request->setTag($id);
    $client->sendAsync($request);
    $client->loop();
    echo 'OK';

If I omit the loop() method, the echo statement is reached but the request is never sent. What am I doing wrong?

@boenrobot
Copy link
Member

You're using a Linux web server I'm guessing?

I'm aware of this issue, but I have no idea how to solve it... Once upon a time (b4), this was also an issue with Windows. I did fixed it for Windows, and thought that was the end of it, but alas, no.

If you find a way to solve it, I would very much welcome a pull request (or even just a hint of the solution...).

In the meantime, the workaround is to create a scheduler item that runs after 2 seconds, and on its run, removes itself, and then reboots, i.e.

    $client = new RouterOS\Client($ip, 'admin', 'password', null, false, 10);
    $request = new RouterOS\Request(
        '/system scheduler add name=REBOOT interval=2s
        on-event="/system scheduler remove REBOOT;/system reboot"'
    );
    $client->sendSync($request);
    echo 'OK';
    unset($client);

Note also that the $client object must be unset() before the time of the actual reboot, or otherwise go out of scope (e.g. if the code above was within a function - have the function end after the echo, and remove the unset() call).

What triggers the hang ultimately is the fact that upon disconnect/unset, a "/quit" command is sent. But because the connection is closed, the command is never successfully sent, and the client keeps retrying. There is a check as to whether the connection is even opened before every sending attempt (that was the solution with Windows), but while that check works for Windows, with Linux it doesn't for some reason. And a "/quit" is sent in the first place to prevent a memory leak for some older RouterOS versions and some RouterBOARDs.

@strongwazz
Copy link

strongwazz commented Nov 11, 2016

I would say this has little to do with the /quit being sent on close()

Passing NULL to PEAR2\Net\RouterOS\Client::dispatchNextResponse (from completeRequest) eventually passes NULL as the tv_sec argument to stream_select() in PEAR2\Net\Transmitter\stream::isDataAwaiting.

If tv_sec is NULL stream_select() can block indefinitely, returning only when an event on one of the watched streams occurs.

No events are going to occur at this point, afaik. Maybe the windows stack does it differently

@boenrobot
Copy link
Member

boenrobot commented Nov 11, 2016

That's an interesting point, thanks.

The problem I was detecting on Windows was a similar thing, where isAcceptingData() was called without an isAvailable() check, so that was the fix. The client in general is running on the assumption that if you have managed to even send a request, you should keep waiting for the response, but I hadn't considered you may end up successfully sending a "/quit", but not receive a reply because the restart would occur before you do.

EDIT: Hmm... but even if I add an isAvailable() check before isDataAwaiting(), there remains a theoretical possibility that in between the isAvailable() check, and the stream_select() call, the connection is closed, and the client is left hanging anyway. What Windows is doing differently (or perhaps, what PHP is doing differently on Windows) is precisely to acknowledge this possibility, so that if a closed connection is passed to stream_select(), it is immediately discarded from the list, and with 0 connections left to check, stream_select() returns 0 immediately, instead of waiting indefinitely.

Then again, this is a very unlikely possibility, so I'll add that anyway.

@boenrobot
Copy link
Member

@khandieyea or @FezzFest

Could either of you please test with the "develop" branch of Net_Transmitter on Linux to see if this last commit fixes the issue?

(I know this isn't a PHAR, but you can install the develop branch with Composer...)

@strongwazz
Copy link

Hey @boenrobot, I've tested develop, sadly no change with rebooting.

@boenrobot
Copy link
Member

Well... I can't say I'm surprised, but it was worth a shot. Thank you for the tip and testing anyway.

What distro and version are you using anyway? (Maybe I could try making myself a VM with it some time...)

If the Windows analog is any indication, the problem is indeed that the sending attempt (fwrite() call) of "/quit" keeps failing, yet is being retried infinitely, and the feof() or stream_select() checks don't help on Linux for some reason... Or (more likely now, post the Windows fix), feof() in particular doesn't work on Linux's network streams (whereas on Windows, feof() returns true if the connection is closed), thus causing stream_select() to wait indefinitely for sending (I mean, remember, as soon as the reboot gives out its !done reply, the router just silently drops all connections and reboots, making it unable to even receive a "/quit", let alone reply to it).

@strongwazz
Copy link

Hi @boenrobot

We're running pretty standard ubuntu 16.04.

I'm fairly certain this has nothing to do with the final /quit. Even with that code removed, I see the same behaviour. I'm also yet to see a !done coming back from the router in wireshark.

All I see is the /reboot being sent, and then "poof" it's gone.

As a side node - this issue exists in other PHP clients, and also persists in other node and python implementations. However Java is apparently OK (never tested).

@boenrobot
Copy link
Member

boenrobot commented Nov 13, 2016

All I see is the /reboot being sent, and then "poof" it's gone.

That's just it. Because the connection gets closed, the fwrite() call that sends "/quit" fails, and thus a packet never actually goes over the wire to be seen by Wireshark. It's exactly like that on Windows, except that thanks to the isAcceptingData() check, the client successfully gives up.

As a side node - this issue exists in other PHP clients, and also persists in other node and python implementations.

But this... this is new... Denis Basta's API client doesn't send a "/quit", so I would've thought it wouldn't be affected. But then again, I haven't tried it personally either. And I'm not aware of the other's intricacies, but I wouldn't be surprised if the Node and Python clients have the same problem as I did for Windows and haven't fixed it, while the Java one has it fixed, and yet the Java runtime does some magic to make the checks work the same way for Linux as well.

Even with that code removed, I see the same behaviour.

Just to be clear... even with this whole block removed/commented? That's new too... Back when I first got a report about this, removing this worked, but I've merely been too stubborn to remove it completely for the reasons mentioned previously in this issue.

@boenrobot
Copy link
Member

Huh... funny... to add another weird twist to all of this... I just set up an Ubuntu Server 16.10, and I can't replicate this on it.

With only the built in packages and the built in OpenSSH added (to make it easier on me to test...), all updated with sudo apt-get update; sudo apt-get upgrade. I used the PHP available with sudo apt-get php-common php-cli, and the version I got is PHP 7.0.8-3ubuntu3 (cli) ( NTS )... Doesn't happen there. This is with the "develop" branches of both Net_RouterOS and Net_Transmitter, but considering I've done nothing specifically targeted at this issue, I'm very surprised it's not happening.

My full test code

<?php

use PEAR2\Autoload;
use PEAR2\Net\RouterOS;

error_reporting(E_ALL | E_STRICT);
ini_set('display_errors', 1);

require_once 'Autoload.git/src/PEAR2/Autoload.php';
Autoload::initialize(__DIR__ . '/Net_RouterOS.git/src');
Autoload::initialize(__DIR__ . '/Net_Transmitter.git/src');
var_dump(Autoload::getPaths());

$client = new RouterOS\Client('192.168.88.1', 'admin', '');
$client->sendSync(new RouterOS\Request('/system reboot'));
sleep(2);
$char = $client->getCharset(RouterOS\Communicator::CHARSET_REMOTE);
var_dump($char);

echo 'OK?';
echo "\n";

The added sleep(2); is there to make sure the "/quit" attempt only happens after the reboot commences. With or without it, there's no error of any kind at any point.

I wonder if it's a kernel issue that's already fixed, perhaps as recently as in between 16.04 and 16.10.

@strongwazz
Copy link

strongwazz commented Nov 14, 2016

Is it actually rebooting?

@boenrobot
Copy link
Member

boenrobot commented Nov 14, 2016

It is rebooting, yes, but more importantly, doing so without any errors or hangs on the PHP side. Previously, it would reboot as well, but as @FezzFest mentiond, it would just hang OR (as other reports I've had), it wouldn't hang, but would finish up with an error.

(The RouterOS I'm using is a real 951Ui-2HnD with 6.37.1; The Ubuntu server is in a Hyper-V VM...)

@strongwazz
Copy link

strongwazz commented Nov 14, 2016

Had to ask, I've had issues with "/system reboot" not actually rebooting, but
"/system/reboot" working.

Is your routeros target on the same LAN? I'll test with 16.10

@boenrobot
Copy link
Member

boenrobot commented Nov 14, 2016

Heh. This API client translates "/system reboot" to "/system/reboot" under the hood, but most others don't, so no surprise there ;-) . LeGrange's Java client is among the few others who do this.

Yes, the RouterOS and VM are in the same LAN, thanks to Hyper-V's switching. Both in the 192.168.88.0/24 subnet. I don't think I can setup a more complicated setup than that though, as trying to do NAT with Hyper-V in place can be kind of tricky, and equally tricky is setting up Ubuntu Server (or any x64 OS) on VirtualBox...

@strongwazz
Copy link

strongwazz commented Nov 14, 2016

Yea it is strange. Ok that's all great. I'm building 2 vanilla 16.04 and 16.10
boxes, will implement your test structure, and see what happens.

Thanks a million!

@boenrobot boenrobot pinned this issue Apr 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants