Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(agents-api): Add /healthz endpoint #975

Merged
merged 4 commits into from
Dec 19, 2024
Merged

chore(agents-api): Add /healthz endpoint #975

merged 4 commits into from
Dec 19, 2024

Conversation

HamadaSalhab
Copy link
Contributor

@HamadaSalhab HamadaSalhab commented Dec 19, 2024

PR Type

Enhancement


Description

  • Added new health check endpoint /healthz to monitor service health
  • Health check verifies database connectivity by attempting to list agents
  • Returns "ok" status when healthy, error status with message when unhealthy
  • Configured Traefik gateway to route health check requests
  • Endpoint is accessible without API key authentication

Changes walkthrough 📝

Relevant files
Enhancement
__init__.py
Initialize healthz router module                                                 

agents-api/agents_api/routers/healthz/init.py

  • Created new healthz module with router and check_health imports
+3/-0     
check_health.py
Implement health check endpoint with database verification

agents-api/agents_api/routers/healthz/check_health.py

  • Added /healthz endpoint that checks database connectivity
  • Returns status "ok" if database is reachable
  • Returns error status with message if database check fails
  • +18/-0   
    router.py
    Create FastAPI router for health checks                                   

    agents-api/agents_api/routers/healthz/router.py

    • Created new APIRouter instance for health check endpoints
    +3/-0     
    web.py
    Register health check router in FastAPI app                           

    agents-api/agents_api/web.py

  • Imported healthz router module
  • Added healthz router to FastAPI application
  • +2/-0     
    Configuration changes
    traefik.yml.template
    Configure Traefik gateway for health check endpoint           

    gateway/traefik.yml.template

  • Added new route configuration for /api/healthz endpoint
  • Set priority and middleware for health check endpoint
  • +11/-2   

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🧪 No relevant tests
    🔒 Security concerns

    Authentication Bypass:
    The /healthz endpoint is accessible without API key authentication. While this is common for health check endpoints, ensure this doesn't expose sensitive information about the system state and consider implementing rate limiting to prevent potential DoS attacks through repeated health checks.

    ⚡ Recommended focus areas for review

    Error Handling
    The health check catches all exceptions generically and returns them as error messages. Consider catching specific database exceptions and returning standardized error responses.

    Security Configuration
    The healthz router is included without any authentication dependencies, while other endpoints require API key authentication. Verify this is intentional and documented.

    Resource Usage
    The health check queries the agents table with a hardcoded UUID. Consider using a lighter database check that doesn't access production tables.

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Possible issue
    Add timeout handling to prevent health check from hanging indefinitely

    Add response timeout handling for the database check to prevent the health endpoint
    from hanging indefinitely.

    agents-api/agents_api/routers/healthz/check_health.py [10-14]

     try:
    -    # Check if the database is reachable
    -    agents = list_agents_query(
    -        developer_id=UUID("00000000-0000-0000-0000-000000000000"),
    -    )
    +    # Check if the database is reachable with timeout
    +    async with asyncio.timeout(5.0):
    +        agents = await list_agents_query(
    +            developer_id=UUID("00000000-0000-0000-0000-000000000000"),
    +        )
    • Apply this suggestion
    Suggestion importance[1-10]: 9

    Why: Adding a timeout to the database health check is crucial to prevent the endpoint from hanging indefinitely, which could impact system monitoring and reliability. This is a critical operational improvement.

    9
    General
    Add proper response model and status code specifications for API endpoint

    Add proper response model type hints and HTTP status codes for both success and
    error scenarios.

    agents-api/agents_api/routers/healthz/check_health.py [8-9]

    -@router.get("/healthz", tags=["healthz"])
    -async def check_health() -> dict:
    +@router.get("/healthz", tags=["healthz"], response_model=HealthResponse, status_code=200)
    +async def check_health() -> HealthResponse:
    • Apply this suggestion
    Suggestion importance[1-10]: 6

    Why: Adding proper response model type hints and status codes improves API documentation and type safety, making the endpoint more maintainable and easier to integrate with.

    6

    Copy link

    qodo-merge-pro-for-open-source bot commented Dec 19, 2024

    CI Failure Feedback 🧐

    (Checks updated until commit da9dd5b)

    Action: Typecheck

    Failed stage: Typecheck [❌]

    Failed test name: agents_api.common.protocol.sessions

    Failure summary:

    The action failed during type checking with pytype due to a return type error in the
    get_chat_environment method in agents_api/common/protocol/sessions.py (line 115). The return value
    structure containing user data, settings, and tools does not match the expected return type
    annotation of the method.

    Relevant error logs:
    1:  ##[group]Operating System
    2:  Ubuntu
    ...
    
    1227:  [48/507] check agents_api.common.utils.cozo
    1228:  [49/507] check agents_api.activities.types
    1229:  [50/507] check agents_api.common.exceptions.tools
    1230:  [51/507] check agents_api.activities.excecute_api_call
    1231:  [52/507] check agents_api.common.protocol.tasks
    1232:  [53/507] check agents_api.dependencies.exceptions
    1233:  [54/507] check agents_api.common.utils.messages
    1234:  [55/507] check agents_api.common.protocol.sessions
    1235:  FAILED: /home/runner/work/julep/julep/agents-api/.pytype/pyi/agents_api/common/protocol/sessions.pyi 
    1236:  /home/runner/work/julep/julep/agents-api/.venv/bin/python -m pytype.main --disable pyi-error --imports_info /home/runner/work/julep/julep/agents-api/.pytype/imports/agents_api.common.protocol.sessions.imports --module-name agents_api.common.protocol.sessions --platform linux -V 3.12 -o /home/runner/work/julep/julep/agents-api/.pytype/pyi/agents_api/common/protocol/sessions.pyi --analyze-annotated --nofail --none-is-not-bool --quick --strict-none-binding /home/runner/work/julep/julep/agents-api/agents_api/common/protocol/sessions.py
    1237:  /home/runner/work/julep/julep/agents-api/agents_api/common/protocol/sessions.py:115:9: error: in get_chat_environment: bad return type [bad-return-type]
    ...
    
    1252:  "users": [user.model_dump() for user in self.users],
    1253:  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1254:  "settings": settings,
    1255:  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1256:  "tools": [tool.model_dump() for tool in tools],
    1257:  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1258:  }
    1259:  ~~~~~~~~~
    1260:  For more details, see https://google.github.io/pytype/errors.html#bad-return-type
    ...
    
    1340:  [135/507] check tests.sample_tasks.__init__
    1341:  [136/507] check migrations.migrate_1733755642_transition_indices
    1342:  [137/507] check migrations.migrate_1704892503_tools
    1343:  [138/507] check agents_api.rec_sum.summarize
    1344:  [139/507] check agents_api.__init__
    1345:  [140/507] check agents_api.common.exceptions.agents
    1346:  [141/507] check migrations.migrate_1725153437_add_output_to_executions
    1347:  [142/507] check agents_api.common.exceptions.users
    1348:  ninja: build stopped: cannot make progress due to previous errors.
    1349:  Computing dependencies
    1350:  Generated API key since not set in the environment: 22775751936790361709703647853812
    1351:  Sentry DSN not found. Sentry will not be enabled.
    1352:  Analyzing 336 sources with 0 local dependencies
    1353:  Leaving directory '.pytype'
    1354:  ##[error]Process completed with exit code 1.
    

    ✨ CI feedback usage guide:

    The CI feedback tool (/checks) automatically triggers when a PR has a failed check.
    The tool analyzes the failed checks and provides several feedbacks:

    • Failed stage
    • Failed test name
    • Failure summary
    • Relevant error logs

    In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:

    /checks "https://github.com/{repo_name}/actions/runs/{run_number}/job/{job_number}"
    

    where {repo_name} is the name of the repository, {run_number} is the run number of the failed check, and {job_number} is the job number of the failed check.

    Configuration options

    • enable_auto_checks_feedback - if set to true, the tool will automatically provide feedback when a check is failed. Default is true.
    • excluded_checks_list - a list of checks to exclude from the feedback, for example: ["check1", "check2"]. Default is an empty list.
    • enable_help_text - if set to true, the tool will provide a help message with the feedback. Default is true.
    • persistent_comment - if set to true, the tool will overwrite a previous checks comment with the new feedback. Default is true.
    • final_update_message - if persistent_comment is true and updating a previous checks message, the tool will also create a new message: "Persistent checks updated to latest commit". Default is true.

    See more information about the checks tool in the docs.

    Copy link
    Contributor

    @ellipsis-dev ellipsis-dev bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    👍 Looks good to me! Reviewed everything up to 62fb581 in 1 minute and 27 seconds

    More details
    • Looked at 91 lines of code in 5 files
    • Skipped 0 files when reviewing.
    • Skipped posting 2 drafted comments based on config settings.
    1. agents-api/agents_api/routers/healthz/check_health.py:13
    • Draft comment:
      Consider using a configuration or environment variable for the developer_id instead of a hardcoded UUID. This improves flexibility and maintainability.
    • Reason this comment was not posted:
      Decided after close inspection that this draft comment was likely wrong and/or not actionable:
      This is a health check endpoint where the UUID value doesn't matter - it's just used to test if we can query the database. Moving a dummy test value to configuration would add unnecessary complexity. The all-zeros UUID is a common convention for test/dummy values. This isn't production code that needs to be configurable.
      Maybe there's a specific reason they'd want to test with a real developer ID in certain environments?
      The purpose is just to test database connectivity - any valid UUID would work. Using a well-known dummy UUID makes the intent clearer than hiding it in configuration.
      Delete the comment. Moving a dummy test UUID to configuration would add unnecessary complexity without benefit for this health check endpoint.
    2. agents-api/agents_api/routers/healthz/check_health.py:18
    • Draft comment:
      Add a newline at the end of the file for better compatibility with various tools and systems.
    • Reason this comment was not posted:
      Confidence changes required: 20%
      The check_health function in check_health.py does not include a newline at the end of the file. This is a minor issue but it's a good practice to include a newline at the end of files for compatibility with various tools and systems.

    Workflow ID: wflow_Ps8rCWUx6uMmLnZM


    You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

    developer_id=UUID("00000000-0000-0000-0000-000000000000"),
    )
    except Exception as e:
    return {"status": "error", "message": str(e)}

    Check warning

    Code scanning / CodeQL

    Information exposure through an exception Medium

    Stack trace information flows to this location and may be exposed to an external user.
    This autofix suggestion was applied.
    Show autofix suggestion Hide autofix suggestion

    Copilot Autofix AI 3 days ago

    To fix the problem, we need to ensure that detailed error information is not exposed to the user. Instead, we should log the detailed error message on the server side and return a generic error message to the user. This can be achieved by modifying the exception handling block to log the exception and return a generic error message.

    1. Import the logging module to enable logging of exceptions.
    2. Modify the exception handling block to log the exception using the logging module.
    3. Return a generic error message to the user.
    Suggested changeset 1
    agents-api/agents_api/routers/healthz/check_health.py

    Autofix patch

    Autofix patch
    Run the following command in your local git repository to apply this patch
    cat << 'EOF' | git apply
    diff --git a/agents-api/agents_api/routers/healthz/check_health.py b/agents-api/agents_api/routers/healthz/check_health.py
    --- a/agents-api/agents_api/routers/healthz/check_health.py
    +++ b/agents-api/agents_api/routers/healthz/check_health.py
    @@ -5,3 +5,3 @@
     from .router import router
    -
    +import logging
     
    @@ -15,3 +15,4 @@
         except Exception as e:
    -        return {"status": "error", "message": str(e)}
    +        logging.error("An error occurred while checking health: %s", str(e))
    +        return {"status": "error", "message": "An internal error has occurred."}
     
    EOF
    @@ -5,3 +5,3 @@
    from .router import router

    import logging

    @@ -15,3 +15,4 @@
    except Exception as e:
    return {"status": "error", "message": str(e)}
    logging.error("An error occurred while checking health: %s", str(e))
    return {"status": "error", "message": "An internal error has occurred."}

    Copilot is powered by AI and may make mistakes. Always verify output.
    @creatorrr creatorrr committed this autofix suggestion 3 days ago.
    Positive Feedback
    Negative Feedback

    Provide additional feedback

    Please help us improve GitHub Copilot by sharing more details about this comment.

    Please select one or more of the options
    creatorrr and others added 2 commits December 19, 2024 15:17
    Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
    @creatorrr creatorrr merged commit 3c6c86e into dev Dec 19, 2024
    3 checks passed
    @creatorrr creatorrr deleted the c/healthz-endpoint branch December 19, 2024 09:48
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants