Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(agents-api, memory-store): Add migration for delete cascade + search docs fixes #1023

Merged
merged 2 commits into from
Jan 6, 2025

Conversation

Ahmad-mtos
Copy link
Contributor

@Ahmad-mtos Ahmad-mtos commented Jan 6, 2025

PR Type

Bug fix, Enhancement


Description

  • Added cascading delete behavior to fk_executions_task foreign key in memory-store.

  • Updated confidence default value to 0 in multiple search request models.

  • Fixed transformation logic for content and embeddings in get_doc and list_docs.

  • Adjusted confidence calculation in hybrid and embedding-based search queries.


Changes walkthrough 📝

Relevant files
Enhancement
6 files
Docs.py
Updated confidence default value in search request models
+2/-2     
Docs.py
Updated confidence default value in search request models
+2/-2     
000020_executions_task_cascade.down.sql
Added migration to remove cascading delete behavior           
+13/-0   
000020_executions_task_cascade.up.sql
Added migration to enable cascading delete behavior           
+14/-0   
models.tsp
Updated confidence default value in TypeSpec models           
+2/-2     
openapi-1.0.0.yaml
Updated confidence default value in OpenAPI spec                 
+2/-2     
Bug fix
4 files
get_doc.py
Fixed content and embeddings transformation logic               
+2/-2     
list_docs.py
Fixed content and embeddings transformation logic               
+2/-2     
search_docs_by_embedding.py
Adjusted confidence calculation in embedding-based search
+1/-1     
search_docs_hybrid.py
Adjusted confidence calculation in hybrid search                 
+1/-1     

💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information


Important

Adds migration for cascading deletes in executions and updates document search confidence handling.

  • Database Migration:
    • Adds 000020_executions_task_cascade.up.sql to implement cascading deletes for executions table foreign key fk_executions_task.
    • Adds 000020_executions_task_cascade.down.sql to revert cascading delete changes.
  • Document Search:
    • Sets default confidence to 0 in HybridDocSearchRequest and VectorDocSearchRequest in Docs.py and models.tsp.
    • Adjusts confidence calculation in search_docs_by_embedding.py and search_docs_hybrid.py to use 1.0 - confidence.
  • Transform Functions:
    • Simplifies transform_get_doc() and transform_list_docs() in get_doc.py and list_docs.py by removing single-item extraction logic.

This description was created by Ellipsis for e1b406c. It will automatically update as commits are pushed.

Copy link
Contributor

@standard-input standard-input bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues flagged.
Standard Input can make mistakes. Check important info.

Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 Security concerns

SQL Injection:
The SQL migration files contain raw SQL queries with table and column references. While the migration files themselves don't show direct user input handling, ensure that any dynamic values passed to these queries in the application code are properly parameterized to prevent SQL injection attacks.

⚡ Recommended focus areas for review

Data Transformation

The removal of array index checking in content and embeddings transformation could lead to incorrect handling of single-element arrays. Verify that this change doesn't break existing functionality.

content = d["content"]

embeddings = d["embeddings"]
Confidence Calculation

The confidence calculation has been inverted (1.0 - confidence). Ensure this change aligns with the intended search behavior and scoring mechanism.

1.0 - confidence,

Copy link
Contributor

CI Failure Feedback 🧐

Action: Typecheck

Failed stage: Generate openapi code [❌]

Failed test name: ""

Failure summary:

The action failed because the tsp command was not found in the system. Specifically:

  • The script attempted to run tsp compile . in the typespec/ directory
  • The error "command not found" indicates that the TypeSpec compiler (tsp) is not installed or not in
    the system PATH
  • The process exited with code 127, which typically indicates a command not found error

  • Relevant error logs:
    1:  ##[group]Operating System
    2:  Ubuntu
    ...
    
    187:  Python2_ROOT_DIR: /opt/hostedtoolcache/Python/3.12.8/x64
    188:  Python3_ROOT_DIR: /opt/hostedtoolcache/Python/3.12.8/x64
    189:  LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.12.8/x64/lib
    190:  ##[endgroup]
    191:  + set -e
    192:  + cd typespec/
    193:  + tsp compile .
    194:  scripts/generate_openapi_code.sh: line 10: tsp: command not found
    195:  ##[error]Process completed with exit code 127.
    ...
    
    197:  [command]/usr/bin/git version
    198:  git version 2.47.1
    199:  Temporarily overriding HOME='/home/runner/work/_temp/c3d3b6c0-43c6-4a73-a815-f3566874a9d7' before making global git config changes
    200:  Adding repository directory to the temporary git global config as a safe directory
    201:  [command]/usr/bin/git config --global --add safe.directory /home/runner/work/julep/julep
    202:  [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
    203:  [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :"
    204:  fatal: No url found for submodule path 'drafts/cozo' in .gitmodules
    205:  ##[warning]The process '/usr/bin/git' failed with exit code 128
    

    ✨ CI feedback usage guide:

    The CI feedback tool (/checks) automatically triggers when a PR has a failed check.
    The tool analyzes the failed checks and provides several feedbacks:

    • Failed stage
    • Failed test name
    • Failure summary
    • Relevant error logs

    In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:

    /checks "https://github.com/{repo_name}/actions/runs/{run_number}/job/{job_number}"
    

    where {repo_name} is the name of the repository, {run_number} is the run number of the failed check, and {job_number} is the job number of the failed check.

    Configuration options

    • enable_auto_checks_feedback - if set to true, the tool will automatically provide feedback when a check is failed. Default is true.
    • excluded_checks_list - a list of checks to exclude from the feedback, for example: ["check1", "check2"]. Default is an empty list.
    • enable_help_text - if set to true, the tool will provide a help message with the feedback. Default is true.
    • persistent_comment - if set to true, the tool will overwrite a previous checks comment with the new feedback. Default is true.
    • final_update_message - if persistent_comment is true and updating a previous checks message, the tool will also create a new message: "Persistent checks updated to latest commit". Default is true.

    See more information about the checks tool in the docs.

    Copy link
    Contributor

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Possible issue
    Add defensive programming by safely accessing dictionary values to prevent runtime errors

    Add null check for 'content' and 'embeddings' before accessing them to prevent
    potential KeyError exceptions.

    agents-api/agents_api/queries/docs/get_doc.py [47-49]

    -content = d["content"]
    -embeddings = d["embeddings"]
    +content = d.get("content")
    +embeddings = d.get("embeddings")
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: Using dict.get() instead of direct access is a good defensive programming practice that prevents KeyError exceptions if the keys are missing, making the code more robust and reliable.

    7

    Copy link
    Contributor

    @ellipsis-dev ellipsis-dev bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    👍 Looks good to me! Reviewed everything up to e1b406c in 1 minute and 48 seconds

    More details
    • Looked at 194 lines of code in 10 files
    • Skipped 0 files when reviewing.
    • Skipped posting 4 drafted comments based on config settings.
    1. agents-api/agents_api/queries/docs/search_docs_by_embedding.py:77
    • Draft comment:
      Consider updating the function's documentation to reflect the change in how confidence is used (i.e., 1.0 - confidence). This will help maintain clarity for future developers.
    • Reason this comment was not posted:
      Confidence changes required: 50%
      The change from confidence to 1.0 - confidence in the search_docs_by_embedding function is a logical change that should be reflected in the function's documentation or comments to avoid confusion for future developers.
    2. agents-api/agents_api/queries/docs/search_docs_hybrid.py:94
    • Draft comment:
      Consider updating the function's documentation to reflect the change in how confidence is used (i.e., 1.0 - confidence). This will help maintain clarity for future developers.
    • Reason this comment was not posted:
      Confidence changes required: 50%
      The change from confidence to 1.0 - confidence in the search_docs_hybrid function is a logical change that should be reflected in the function's documentation or comments to avoid confusion for future developers.
    3. agents-api/agents_api/queries/docs/get_doc.py:46
    • Draft comment:
      This function is identical to the existing transform_list_docs. Consider using that function instead to avoid duplication.

    • function transform_list_docs (list_docs.py)

    • Reason this comment was not posted:
      Comment was on unchanged code.

    4. agents-api/agents_api/queries/docs/list_docs.py:54
    • Draft comment:
      This function is identical to the existing transform_get_doc. Consider reusing that function instead of duplicating the code.

    • function transform_get_doc (get_doc.py)

    • Reason this comment was not posted:
      Marked as duplicate.

    Workflow ID: wflow_I4zB8AHZkoRwOyhd


    You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

    @Vedantsahai18 Vedantsahai18 merged commit 7c91669 into dev Jan 6, 2025
    16 of 20 checks passed
    @Vedantsahai18 Vedantsahai18 deleted the x/fk-delete-cascade branch January 6, 2025 16:44
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants