Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(agents-api, memory-store): fix temporal workflows, queries, and migrations #994

Merged
merged 11 commits into from
Dec 26, 2024

Conversation

Ahmad-mtos
Copy link
Contributor

@Ahmad-mtos Ahmad-mtos commented Dec 26, 2024

PR Type

Bug fix, Enhancement


Description

Major fixes and enhancements to temporal workflows and task execution:

  • Fixed critical bugs in temporal workflow handling:

    • Corrected arguments processing in task execution workflow
    • Fixed continuous view aggregation in memory store
    • Resolved issues with task queries and transitions
  • Enhanced error handling and monitoring:

    • Added detailed error output in task execution
    • Improved transition handling with proper error states
    • Added health checks for memory store
  • Improved query performance and data consistency:

    • Enhanced task query with proper workflow/tools aggregation
    • Fixed execution transitions validation
    • Optimized continuous view aggregation
  • Added configuration improvements:

    • Added transition rate limiting configuration
    • Enhanced docker compose dependencies
    • Added proper health checks

Changes walkthrough 📝

Relevant files
Enhancement
transition_step.py
Simplify transition step creation parameters                         

agents-api/agents_api/activities/task_steps/transition_step.py

  • Removed unnecessary parameters from create_execution_transition
    function call
  • +0/-2     
    get_task.py
    Enhance task query with proper aggregation                             

    agents-api/agents_api/queries/tasks/get_task.py

  • Improved SQL query to properly aggregate workflows and tools
  • Added DISTINCT to avoid duplicates
  • Added tools to query results
  • +21/-11 
    create_task_execution.py
    Improve task execution error handling                                       

    agents-api/agents_api/routers/tasks/create_task_execution.py

  • Added task specification to execution input
  • Enhanced error handling with detailed error output
  • Added transition target information
  • +16/-0   
    Bug fix
    temporal.py
    Fix temporal workflow arguments handling                                 

    agents-api/agents_api/clients/temporal.py

  • Fixed incorrect arguments handling in run_task_execution_workflow
  • Added FIXME comment about wrong logic
  • Modified previous_inputs initialization
  • +4/-5     
    000013_executions_continuous_view.up.sql
    Fix executions continuous view aggregation                             

    memory-store/migrations/000013_executions_continuous_view.up.sql

  • Fixed continuous view aggregation
  • Added output to coalesce statement
  • Removed redundant transition_id from GROUP BY
  • +12/-12 

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    Copy link

    gitguardian bot commented Dec 26, 2024

    ⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

    Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

    🔎 Detected hardcoded secret in your pull request
    GitGuardian id GitGuardian status Secret Commit Filename
    14144715 Triggered Generic Password 15f5a89 memory-store/docker-compose.yml View secret
    🛠 Guidelines to remediate hardcoded secrets
    1. Understand the implications of revoking this secret by investigating where it is used in your code.
    2. Replace and store your secret safely. Learn here the best practices.
    3. Revoke and rotate this secret.
    4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

    To avoid such incidents in the future consider


    🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

    Copy link

    qodo-merge-pro-for-open-source bot commented Dec 26, 2024

    CI Failure Feedback 🧐

    (Checks updated until commit b5a8313)

    Action: Typecheck

    Failed stage: Typecheck [❌]

    Failed test name: pytype

    Failure summary:

    The pytype static type checker failed with multiple errors:

  • Type mismatch in search_docs_hybrid.py: The type annotation for embedding doesn't match the
    assignment type
  • Multiple errors in execute_integration.py:
    - Trying to access .id attribute on a potentially None
    value
    - get_tool_args_from_metadata object is not callable
  • Import error in check_health.py: Unable to find module router
  • Stray type comments in test_workflow_routes.py

  • Relevant error logs:
    1:  ##[group]Operating System
    2:  Ubuntu
    ...
    
    1191:  [14/301] check agents_api.autogen.Agents
    1192:  [15/301] check agents_api.autogen.Tasks
    1193:  [16/301] check agents_api.dependencies.exceptions
    1194:  [17/301] check agents_api.common.utils.datetime
    1195:  [18/301] check agents_api.common.protocol.developers
    1196:  [19/301] check agents_api.common.protocol.agents
    1197:  [20/301] check agents_api.metrics.counters
    1198:  [21/301] check agents_api.queries.utils
    1199:  ERROR:pytype.matcher Invalid type: <class 'pytype.abstract.function.ParamSpecMatch'>
    1200:  [22/301] check agents_api.autogen.openapi_model
    1201:  [23/301] check agents_api.common.exceptions.__init__
    1202:  [24/301] check agents_api.routers.docs.router
    1203:  [25/301] check agents_api.exceptions
    1204:  [26/301] check agents_api.queries.developers.get_developer
    1205:  [27/301] check agents_api.queries.docs.search_docs_hybrid
    1206:  FAILED: /home/runner/work/julep/julep/agents-api/.pytype/pyi/agents_api/queries/docs/search_docs_hybrid.pyi 
    1207:  /home/runner/work/julep/julep/agents-api/.venv/bin/python -m pytype.main --disable pyi-error --imports_info /home/runner/work/julep/julep/agents-api/.pytype/imports/agents_api.queries.docs.search_docs_hybrid.imports --module-name agents_api.queries.docs.search_docs_hybrid --platform linux -V 3.12 -o /home/runner/work/julep/julep/agents-api/.pytype/pyi/agents_api/queries/docs/search_docs_hybrid.pyi --analyze-annotated --nofail --none-is-not-bool --quick --strict-none-binding /home/runner/work/julep/julep/agents-api/agents_api/queries/docs/search_docs_hybrid.py
    1208:  /home/runner/work/julep/julep/agents-api/agents_api/queries/docs/search_docs_hybrid.py:50:1: error: in <module>: Type annotation for embedding does not match type of assignment [annotation-type-mismatch]
    ...
    
    1305:  metadata_filter,
    1306:  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1307:  search_language,
    1308:  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1309:  ],
    1310:  ~~~~~~~~~~
    1311:  )
    1312:  ~~~~~
    1313:  For more details, see https://google.github.io/pytype/errors.html#annotation-type-mismatch
    ...
    
    1400:  [114/301] check agents_api.queries.executions.__init__
    1401:  [115/301] check agents_api.queries.developers.__init__
    1402:  [116/301] check agents_api.queries.entries.__init__
    1403:  [117/301] check agents_api.queries.agents.__init__
    1404:  [118/301] check agents_api.common.retry_policies
    1405:  [119/301] check agents_api.clients.integrations
    1406:  [120/301] check agents_api.activities.task_steps.get_value_step
    1407:  [121/301] check agents_api.activities.execute_integration
    1408:  FAILED: /home/runner/work/julep/julep/agents-api/.pytype/pyi/agents_api/activities/execute_integration.pyi 
    1409:  /home/runner/work/julep/julep/agents-api/.venv/bin/python -m pytype.main --disable pyi-error --imports_info /home/runner/work/julep/julep/agents-api/.pytype/imports/agents_api.activities.execute_integration.imports --module-name agents_api.activities.execute_integration --platform linux -V 3.12 -o /home/runner/work/julep/julep/agents-api/.pytype/pyi/agents_api/activities/execute_integration.pyi --analyze-annotated --nofail --none-is-not-bool --quick --strict-none-binding /home/runner/work/julep/julep/agents-api/agents_api/activities/execute_integration.py
    1410:  /home/runner/work/julep/julep/agents-api/agents_api/activities/execute_integration.py:27:15: error: in execute_integration: No attribute 'id' on None [attribute-error]
    1411:  In Optional[agents_api.autogen.openapi_model.TaskSpecDef]
    1412:  task_id = context.execution_input.task.id
    1413:  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1414:  /home/runner/work/julep/julep/agents-api/agents_api/activities/execute_integration.py:29:24: error: in execute_integration: 'agents_api.queries.tools.get_tool_args_from_metadata' object is not callable [not-callable]
    1415:  merged_tool_args = get_tool_args_from_metadata(
    1416:  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1417:  developer_id=developer_id, agent_id=agent_id, task_id=task_id, arg_type="args"
    1418:  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1419:  )
    1420:  ~~~~~
    1421:  /home/runner/work/julep/julep/agents-api/agents_api/activities/execute_integration.py:33:25: error: in execute_integration: 'agents_api.queries.tools.get_tool_args_from_metadata' object is not callable [not-callable]
    1422:  merged_tool_setup = get_tool_args_from_metadata(
    1423:  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1424:  developer_id=developer_id, agent_id=agent_id, task_id=task_id, arg_type="setup"
    1425:  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1426:  )
    1427:  ~~~~~
    1428:  For more details, see https://google.github.io/pytype/errors.html
    ...
    
    1494:  [187/301] check agents_api.queries.executions.get_paused_execution_token
    1495:  [188/301] check agents_api.rec_sum.data
    1496:  [189/301] check agents_api.rec_sum.generate
    1497:  [190/301] check agents_api.routers.sessions.exceptions
    1498:  [191/301] check tests.test_chat_routes
    1499:  [192/301] check agents_api.clients.worker.types
    1500:  [193/301] check agents_api.workflows.__init__
    1501:  [194/301] check tests.test_workflow_routes
    1502:  /home/runner/work/julep/julep/agents-api/tests/test_workflow_routes.py:65:1: error: : Stray type comment: object [ignored-type-comment]
    1503:  #   type: object~~~~~~~~~~~~~~~
    1504:  #   type: object
    1505:  /home/runner/work/julep/julep/agents-api/tests/test_workflow_routes.py:114:1: error: : Stray type comment: object [ignored-type-comment]
    1506:  #   type: object~~~~~~~~~~~~~~~
    1507:  #   type: object
    1508:  For more details, see https://google.github.io/pytype/errors.html#ignored-type-comment
    ...
    
    1521:  [207/301] check agents_api.rec_sum.__init__
    1522:  [208/301] check agents_api.worker.__init__
    1523:  [209/301] check agents_api.model_registry
    1524:  [210/301] check agents_api.common.exceptions.sessions
    1525:  [211/301] check agents_api.__init__
    1526:  [212/301] check tests.test_sessions
    1527:  [213/301] check agents_api.common.exceptions.agents
    1528:  [214/301] check agents_api.routers.healthz.check_health
    1529:  FAILED: /home/runner/work/julep/julep/agents-api/.pytype/pyi/agents_api/routers/healthz/check_health.pyi 
    1530:  /home/runner/work/julep/julep/agents-api/.venv/bin/python -m pytype.main --disable pyi-error --imports_info /home/runner/work/julep/julep/agents-api/.pytype/imports/agents_api.routers.healthz.check_health.imports --module-name agents_api.routers.healthz.check_health --platform linux -V 3.12 -o /home/runner/work/julep/julep/agents-api/.pytype/pyi/agents_api/routers/healthz/check_health.pyi --analyze-annotated --nofail --none-is-not-bool --quick --strict-none-binding /home/runner/work/julep/julep/agents-api/agents_api/routers/healthz/check_health.py
    1531:  /home/runner/work/julep/julep/agents-api/agents_api/routers/healthz/check_health.py:5:1: error: in <module>: Can't find module 'router'. [import-error]
    1532:  from .router import router
    1533:  ~~~~~~~~~~~~~~~~~~~~~~~~~~
    1534:  For more details, see https://google.github.io/pytype/errors.html#import-error
    ...
    
    1544:  [224/301] check agents_api.clients.worker.__init__
    1545:  [225/301] check agents_api.rec_sum.trim
    1546:  [226/301] check agents_api.routers.agents.delete_agent_tool
    1547:  [227/301] check agents_api.clients.worker.worker
    1548:  [228/301] check agents_api.rec_sum.entities
    1549:  [229/301] check agents_api.routers.agents.update_agent_tool
    1550:  [230/301] check agents_api.common.utils.debug
    1551:  [231/301] check agents_api.routers.agents.patch_agent_tool
    1552:  ninja: build stopped: cannot make progress due to previous errors.
    1553:  Computing dependencies
    1554:  Generated API key since not set in the environment: 23756738964773662262192138776881
    1555:  Sentry DSN not found. Sentry will not be enabled.
    1556:  Analyzing 291 sources with 0 local dependencies
    1557:  Leaving directory '.pytype'
    1558:  ##[error]Process completed with exit code 1.
    

    ✨ CI feedback usage guide:

    The CI feedback tool (/checks) automatically triggers when a PR has a failed check.
    The tool analyzes the failed checks and provides several feedbacks:

    • Failed stage
    • Failed test name
    • Failure summary
    • Relevant error logs

    In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:

    /checks "https://github.com/{repo_name}/actions/runs/{run_number}/job/{job_number}"
    

    where {repo_name} is the name of the repository, {run_number} is the run number of the failed check, and {job_number} is the job number of the failed check.

    Configuration options

    • enable_auto_checks_feedback - if set to true, the tool will automatically provide feedback when a check is failed. Default is true.
    • excluded_checks_list - a list of checks to exclude from the feedback, for example: ["check1", "check2"]. Default is an empty list.
    • enable_help_text - if set to true, the tool will provide a help message with the feedback. Default is true.
    • persistent_comment - if set to true, the tool will overwrite a previous checks comment with the new feedback. Default is true.
    • final_update_message - if persistent_comment is true and updating a previous checks message, the tool will also create a new message: "Persistent checks updated to latest commit". Default is true.

    See more information about the checks tool in the docs.

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Data Handling

    The FIXME comment indicates wrong logic in arguments handling. The code modifies execution_input.arguments directly which could have side effects. Consider creating a copy before modification.

    # FIXME: This is wrong logic
    old_args = execution_input.arguments
    execution_input.arguments = await offload_if_large(old_args)
    
    previous_inputs: list[dict] = previous_inputs or [execution_input.arguments]
    Parameter Change

    The workflow run method signature changed to require start and previous_inputs parameters that were previously optional. This could break existing callers that don't provide these parameters.

    async def run(
        self,
        execution_input: ExecutionInput,
        start: TransitionTarget,
        previous_inputs: list,
    ) -> Any:
    Data Migration

    The continuous view aggregation changes could affect existing data. Need to verify data consistency after migration and ensure proper handling of existing records.

        time_bucket ('1 day', created_at) AS bucket,
        execution_id,
        last(transition_id, created_at) AS transition_id,
        count(*) AS total_transitions,
        state_agg(created_at, to_text(type)) AS state,
        max(created_at) AS created_at,
        last(type, created_at) AS type,
        last(step_definition, created_at) AS step_definition,
        last(step_label, created_at) AS step_label,
        last(current_step, created_at) AS current_step,
        last(next_step, created_at) AS next_step,
        last(output, created_at) AS output,
        last(task_token, created_at) AS task_token,
        last(metadata, created_at) AS metadata
    FROM

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Possible issue
    Maintain backward compatibility by providing default values for optional parameters

    The workflow run method has required parameters start and previous_inputs but they
    were previously optional with default values. This breaking change could cause
    failures in existing code that doesn't provide these parameters.

    agents-api/agents_api/workflows/task_execution/init.py [140-145]

     async def run(
         self,
         execution_input: ExecutionInput,
    -    start: TransitionTarget,
    -    previous_inputs: list,
    +    start: TransitionTarget = TransitionTarget(workflow="main", step=0),
    +    previous_inputs: list | None = None,
     ) -> Any:
    +    previous_inputs = previous_inputs or [execution_input.arguments]
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: This is a critical suggestion as removing default parameter values is a breaking change that could cause runtime failures in existing code that relies on these defaults.

    8
    Add validation or safe access methods when making required fields optional to prevent null pointer exceptions

    The task field in ExecutionInput was changed from required to optional, but code
    accessing this field may not handle the None case, potentially causing
    NullPointerExceptions.

    agents-api/agents_api/common/protocol/tasks.py [140-143]

     class ExecutionInput(BaseModel):
         developer_id: UUID
         execution: Execution | None = None
         task: TaskSpecDef | None = None
    +    
    +    def get_task(self) -> TaskSpecDef:
    +        if self.task is None:
    +            raise ValueError("Task is required but was not provided")
    +        return self.task
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: Important suggestion that addresses potential runtime errors by adding proper validation for a newly optional field that was previously required.

    7
    General
    Unclear FIXME comments should be either removed or properly documented with explanation and fix

    The FIXME comment indicates wrong logic but doesn't explain why. Either remove the
    comment if the logic is correct, or fix the underlying issue and document the fix.
    Unclear comments can mislead future developers.

    agents-api/agents_api/clients/temporal.py [99-101]

    -# FIXME: This is wrong logic
    +# Handle large arguments by offloading to blob storage if needed
     old_args = execution_input.arguments
     execution_input.arguments = await offload_if_large(old_args)
    • Apply this suggestion
    Suggestion importance[1-10]: 4

    Why: The suggestion correctly identifies an unclear FIXME comment that could confuse developers, but the impact is moderate since it's mainly about code documentation clarity rather than functionality.

    4

    Copy link
    Contributor

    @ellipsis-dev ellipsis-dev bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    ❌ Changes requested. Reviewed everything up to b5a8313 in 2 minutes and 28 seconds

    More details
    • Looked at 846 lines of code in 20 files
    • Skipped 0 files when reviewing.
    • Skipped posting 3 drafted comments based on config settings.
    1. agents-api/agents_api/common/protocol/tasks.py:245
    • Draft comment:
      The include_remote parameter is marked as deprecated. Consider removing or replacing it to avoid confusion and potential maintenance issues.
    • Reason this comment was not posted:
      Confidence changes required: 50%
      In the prepare_execution_input function, the include_remote parameter is marked as deprecated with a FIXME comment. This indicates that the parameter should be removed or replaced in the future. However, the current implementation still uses it, which might lead to confusion or maintenance issues.
    2. agents-api/agents_api/queries/tasks/create_or_update_task.py:94
    • Draft comment:
      The ON CONFLICT clause for the tasks table updates the version by incrementing tasks.version + 1, which seems incorrect. The version should only be incremented when a new version is created, not on conflict. Consider revisiting this logic.
    • Reason this comment was not posted:
      Decided after close inspection that this draft comment was likely wrong and/or not actionable:
      The version handling is complex here. The CTE calculates next_version as MAX(version) + 1. When inserting, it uses this next_version. If there's a conflict, it increments the version again. This double increment could be intentional to skip a version number on conflicts, ensuring unique versions even in race conditions. Without deeper knowledge of the system's versioning requirements, I can't be certain this is wrong.
      I might be missing important business logic around version numbering. Perhaps there's a requirement that conflict resolution should maintain sequential version numbers without gaps.
      While the versioning logic seems unusual, without clear evidence that it's incorrect and understanding of the system's versioning requirements, we shouldn't assume it's a bug.
      The comment makes an assumption about versioning requirements without strong evidence. The current implementation could be intentional for handling race conditions.
    3. agents-api/agents_api/queries/agents/create_or_update_agent.py:56
    • Draft comment:
      Updating canonical_name in the ON CONFLICT clause might lead to inconsistencies, as it is typically a unique identifier that should not change once set. Consider revisiting this logic.
    • Reason this comment was not posted:
      Decided after close inspection that this draft comment was likely wrong and/or not actionable:
      The code actually handles canonical_name carefully - it first tries to preserve the existing one through the CTE, and only generates a new one if needed. The ON CONFLICT clause isn't blindly overwriting it, it's using the same value that was just computed in the INSERT portion. This makes the comment's concern invalid - there won't be inconsistencies because the canonical_name logic is consistent between INSERT and UPDATE paths.
      I could be missing some business logic requirements around canonical_name immutability. There might be other parts of the system that depend on canonical_name never changing.
      Even if there were such requirements, the code already preserves existing canonical_names through the CTE, so this implementation is safe. The comment is raising a theoretical concern that the code already handles correctly.
      The comment should be deleted because it raises a concern that is already properly handled by the code's careful treatment of canonical_name values.

    Workflow ID: wflow_keWXJzUuAqLdRTxB


    Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

    agents-api/agents_api/clients/temporal.py Show resolved Hide resolved
    @whiterabbit1983 whiterabbit1983 merged commit 9b9bd73 into f/switch-to-pg Dec 26, 2024
    6 of 11 checks passed
    @whiterabbit1983 whiterabbit1983 deleted the x/fix-temporal-workflows branch December 26, 2024 19:41
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    4 participants