Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Connection loss to Zino server is silently ignored on NTIE refresh #116

Closed
podliashanyk opened this issue Jun 7, 2024 · 10 comments
Closed
Labels
beta Part of beta release bug Something isn't working discussion help wanted Extra attention is needed zinolib Needs/has companion issue in zinolib

Comments

@podliashanyk
Copy link
Contributor

podliashanyk commented Jun 7, 2024

Seems like connection loss to Zino server is not "picked up" by Howitz when requesting for NTIE updates. Further investigation is needed.

To reproduce:

  1. Run howitz, browse
  2. Turn off the internet where you run howitz
  3. Wait at least longer than refresh_interval
  4. There is nothing that complains about the connection being gone

Curitz will eventually quit with the error message "Connection lost with the zino server...". Howitz does nothing.

@podliashanyk podliashanyk added bug Something isn't working help wanted Extra attention is needed zinolib Needs/has companion issue in zinolib discussion beta Part of beta release labels Jun 7, 2024
@hmpf
Copy link
Contributor

hmpf commented Jun 10, 2024

The code in zinolib.ritz.notifier.poll either:

  • returns a NotifierResponse or None
  • raises one of NotConnectedError (for known errors) or ProtocolError (for unknown errors. Bare except!)

None is returned if an OSError with errno.EAGAIN or errno.EWOULDBLOCK was raised while there was no newline in notifier._buff. EAGAIN/EWOULDBLOCK on linux means "temporary failure, please feel free to try again".

This code never raises RetryError.

@hmpf
Copy link
Contributor

hmpf commented Jun 10, 2024

The code in zinolib.controllers.zino1.UpdateHandler.get_event_update calls zinolib.ritz.notifier.poll.

It does not attempt to catch any exceptions.

If .poll returns None (after OSError errno.EAGAIN or errno.EWOULDBLOCK) it returns False, otherwise it handles the update.

The actual handler either returns the id of an updated event or None if it's a new, unfinished event or False if it was an unknown/unhandled update type.

@hmpf
Copy link
Contributor

hmpf commented Jun 10, 2024

Howitz's howitz.endpoints.update_events runs zinolib.controllers.zino1.UpdateHandler.get_event_update in a while-loop.

It catches RetryError, which will never be raised.

If the event updater returns None, it breaks the while loop. This is what should trigger a new attempt and is the reason I suspect for the sudden loss of updates.

We should try:

if updated:
    updated_ids.add(updated)

instead of

if not updated:                                                                                                                                                                    
    break                                                               
updated_ids.add(updated) 

and stop catching RetryError.

@hmpf
Copy link
Contributor

hmpf commented Jun 10, 2024

Some polish might be to have a counter for how many times we have seen updated == None and raise a more loud error/check the connection if it is above the magic number. On every successful update the counter is set to zero.

@hmpf
Copy link
Contributor

hmpf commented Jun 10, 2024

Also, might be smart to have time.sleep(1) if updated == None.

@hmpf
Copy link
Contributor

hmpf commented Jun 10, 2024

I'll make a patch.

@hmpf
Copy link
Contributor

hmpf commented Jun 11, 2024

Curitz aborts because it attempts to receive from the API socket and the API socket times out. It does not fail because the NTIE socket dies.

@hmpf
Copy link
Contributor

hmpf commented Jun 11, 2024

This is fixed in zinolib 1.2.0, that is: there is a new method zinolib.controllers.zino1.Zino1EventManager.test_connection() that will raise a TimeoutError if the server is not reachable. Howitz will have to run the method and do something with the timeout as per #115.

Curitz checks the connection every 60 seconds, this is hard coded.

@hmpf
Copy link
Contributor

hmpf commented Jun 11, 2024

As per previous comment, closing this. When to run the test-method belongs in a separate issue.

@hmpf hmpf closed this as completed Jun 11, 2024
@hmpf
Copy link
Contributor

hmpf commented Jun 12, 2024

See also Uninett/zinolib#63

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beta Part of beta release bug Something isn't working discussion help wanted Extra attention is needed zinolib Needs/has companion issue in zinolib
Projects
None yet
Development

No branches or pull requests

2 participants