Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(trino): add delay time to avoid Trino issue #735

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

grieve54706
Copy link
Contributor

@grieve54706 grieve54706 commented Nov 14, 2024

I found the Trino container usually query failed due to nodes is empty.

I tried to execute SELECT * FROM system.runtime.nodes and SHOW CATALOGS LIKE 'tpch' before querying to ensure that the node was active and the 'tpch' catalog was ready. It still encountered the problem. This issue may be related to a Trino discovery node problem. A similar issue was reported previously (see trinodb/trino#13388)

The final solution is sleeping for a few seconds.

Copy link

codecov bot commented Nov 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (main@82a2e7b). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #735   +/-   ##
=======================================
  Coverage        ?   85.58%           
=======================================
  Files           ?       12           
  Lines           ?      666           
  Branches        ?      104           
=======================================
  Hits            ?      570           
  Misses          ?       74           
  Partials        ?       22           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@alexanderankin
Copy link
Member

what exception does it throw? can't that exception be added to wait_container_is_ready or one of the other decorator we have and it will catch that and rerun the waiter method? this module seems to be abusing the decorators a bit here....

@grieve54706
Copy link
Contributor Author

The raised exception is when we use the Trino client to connect the container. We can't catch it.

        conn = connect(
            host=db.get_container_host_ip(),
            port=db.get_exposed_port(db.port),
            user="test",
        )
        cur = conn.cursor()
>       cur.execute("CREATE TABLE memory.default.orders AS SELECT * from tpch.tiny.orders")

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../.venv/lib/python3.11/site-packages/trino/dbapi.py:589: in execute
    self._iterator = iter(self._query.execute())
../.venv/lib/python3.11/site-packages/trino/client.py:829: in execute
    self._result.rows += self.fetch()
../.venv/lib/python3.11/site-packages/trino/client.py:849: in fetch
    status = self._request.process(response)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <trino.client.TrinoRequest object at 0x115a2c210>
http_response = <Response [200]>

    def process(self, http_response) -> TrinoStatus:
        if not http_response.ok:
            self.raise_response_error(http_response)
    
        http_response.encoding = "utf-8"
        response = http_response.json()
        if "error" in response:
>           raise self._process_error(response["error"], response.get("id"))
E           trino.exceptions.TrinoQueryError: TrinoQueryError(type=INTERNAL_ERROR, name=GENERIC_INTERNAL_ERROR, message="nodes is empty", query_id=20241118_033536_00001_tf8kg)

../.venv/lib/python3.11/site-packages/trino/client.py:621: TrinoQueryError

@grieve54706
Copy link
Contributor Author

grieve54706 commented Nov 18, 2024

The trino logs in the container

2024-11-18 11:46:00 2024-11-18T03:46:00.216Z    INFO    main    io.trino.server.Server  Server startup completed in 7.66s
2024-11-18 11:46:00 2024-11-18T03:46:00.216Z    INFO    main    io.trino.server.Server  ======== SERVER STARTED ========
2024-11-18 11:46:00 2024-11-18T03:46:00.844Z    INFO    dispatcher-query-3      io.trino.event.QueryMonitor     TIMELINE: Query 20241118_034600_00000_g6qa4 :: FINISHED :: elapsed 379ms :: planning 119ms :: waiting 26ms :: scheduling 216ms :: running 32ms :: finishing 12ms :: begin 2024-11-18T03:46:00.425Z :: end 2024-11-18T03:46:00.804Z
2024-11-18 11:46:01 2024-11-18T03:46:01.070Z    INFO    dispatcher-query-5      io.trino.event.QueryMonitor     TIMELINE: Query 20241118_034600_00001_g6qa4 :: FAILED (GENERIC_INTERNAL_ERROR) :: elapsed 197ms :: planning 197ms :: waiting 0ms :: scheduling 0ms :: running 0ms :: finishing 0ms :: begin 2024-11-18T03:46:00.871Z :: end 2024-11-18T03:46:01.068Z

You can see the SERVER STARTED log; the first query is FINISHED but the second is FAILED.

But the first query is SELECT 1. Maybe I can use tpch.tiny.nation and wait for it. WDYT?

deadline = time.time() + c.max_tries
while time.time() < deadline:
    try:
        cur = conn.cursor()
        cur.execute("SELECT * FROM tpch.tiny.nation LIMIT 1")
        cur.fetchall()
        return
    except Exception:
        time.sleep(c.sleep_time)

raise TimeoutError(f"Trino did not start within {c.max_tries:.3f} seconds")

@grieve54706
Copy link
Contributor Author

Hi @alexanderankin, could you recheck this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants