Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MyBinder.org Events Archive not recording launches #3167

Open
rgaiacs opened this issue Jan 16, 2025 · 7 comments
Open

MyBinder.org Events Archive not recording launches #3167

rgaiacs opened this issue Jan 16, 2025 · 7 comments

Comments

@rgaiacs
Copy link
Collaborator

rgaiacs commented Jan 16, 2025

https://archive.analytics.mybinder.org/ shows

image

Date Filename Number of Events
2025-01-16 events-2025-01-16.jsonl 0
2025-01-15 events-2025-01-15.jsonl 28
2025-01-14 events-2025-01-14.jsonl 0
2025-01-13 events-2025-01-13.jsonl 1310
2025-01-12 events-2025-01-12.jsonl 799
2025-01-11 events-2025-01-11.jsonl 865
2025-01-10 events-2025-01-10.jsonl 3489
2025-01-09 events-2025-01-09.jsonl 5216

On 2025-01-10, the OVH server went down as reported in #3160.

On 2025-01-13, #3165 change the traffic from mybinder.org.

On 2025-01-14, archive.analytics.mybinder.org recorded zero events. But Grafana shows otherwise.

Screenshot 2025-01-16 at 11-06-37 Overview - notebooks gesis org - Dashboards - Grafana

On 2025-01-15, something happen that a few events were recorded.

cc @arnim

@minrk
Copy link
Member

minrk commented Jan 16, 2025

I don't see an event published from GESIS since 2025-01-15T22:04:14.729182Z Can you check the logs from the GESIS binderhub pod? This should be a direct communication between the BinderHub instance and Google's logging platform, initiated here.

@rgaiacs
Copy link
Collaborator Author

rgaiacs commented Jan 16, 2025

The log of the binder pod includes

[E 250116 11:09:13 background_thread:118] Failed to submit 1 logs.
    Traceback (most recent call last):
      File "/usr/local/lib/python3.11/site-packages/google/api_core/grpc_helpers.py", line 76, in error_remapped_callable
        return callable_(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/grpc/_channel.py", line 1181, in __call__
        return _end_unary_response_blocking(state, call, False, None)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/grpc/_channel.py", line 1006, in _end_unary_response_blocking
        raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    	status = StatusCode.UNAVAILABLE
    	details = "failed to connect to all addresses; last error: UNKNOWN: ipv6:%5B2001:4860:4802:34::174%5D:443: connect: Network is unreachable (101)"
    	debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"failed to connect to all addresses; last error: UNKNOWN: ipv6:%5B2001:4860:4802:34::174%5D:443: connect: Network is unreachable (101)", grpc_status:14, created_time:"2025-01-16T11:09:13.217412297+00:00"}"
    >
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/usr/local/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py", line 144, in retry_target
        result = target()
                 ^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/google/api_core/timeout.py", line 120, in func_with_timeout
        return func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable
        raise exceptions.from_grpc_error(exc) from exc
    google.api_core.exceptions.ServiceUnavailable: 503 failed to connect to all addresses; last error: UNKNOWN: ipv6:%5B2001:4860:4802:34::174%5D:443: connect: Network is unreachable (101)

I will follow up with @arnim.

@minrk
Copy link
Member

minrk commented Jan 16, 2025

ok, so seems like a networking problem

@rgaiacs
Copy link
Collaborator Author

rgaiacs commented Jan 17, 2025

Thanks @minrk. @arnim and @rgaiacs are in contact with GESIS internet provider.

@rgaiacs
Copy link
Collaborator Author

rgaiacs commented Jan 20, 2025

@arnim and @rgaiacs were informed that GESIS is not using IPv6. This brings the question of why the server is trying to use IPv6. After some reading, server is getting the IPv6 from the DNS server. If this is true, @arnim and @rgaiacs need to ask GESIS IT to refresh the DNS resolution in their network. But @rgaiacs cannot reproduce the error in a minimal working example:

Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib3
>>> http = urllib3.PoolManager()
>>> url = 'https://googleapis.com/'
>>> response = http.request('GET', url)
>>> response.status
404

@minrk @manics do you know the Google Cloud domain used by BinderHub? https://github.com/googleapis/python-api-core/blob/a5604a55070c6d92618d078191bf99f4c168d5f6/google/api_core/universe.py#L47C10-L47C38 mentions the environment variable GOOGLE_CLOUD_UNIVERSE_DOMAIN.

@manics
Copy link
Member

manics commented Jan 20, 2025

Can you check whether your K8s cluster is deployed in dual stack mode?
https://kubernetes.io/docs/concepts/services-networking/dual-stack/

@rgaiacs
Copy link
Collaborator Author

rgaiacs commented Jan 20, 2025

Thanks @manics! I upgraded Kubernetes from 1.28 to 1.32 last week and this is probably the problem. I will disable the IPv6 tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants