Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Data Collector API #314

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

magajh
Copy link
Contributor

@magajh magajh commented Jan 14, 2025

Description

This PR introduces a new Data Collector API. The purpose of this API is to enable Open edX instances to collect and send content-related data to a central service (e.g., the Shipyard API). This lays the foundation for the Hosting division to make data-driven decisions and improve the overall service offering.

The new API is part of the data-collector module, ensuring clear separation from existing APIs in the data directory for better modularity and future extensibility.


Rationale

  • Flexibility and Modularity: Encapsulating the new functionality in a separate module (data-collector) allows for easier maintenance and potential migration if required in the future.
  • Centralized Decision-Making: By collecting metrics like active users, course data, and usage patterns from Open edX instances, the Hosting division can prioritize resources, optimize support, and provide actionable insights to clients.
  • Adaptability: The implementation is flexible, allowing the destination endpoint (e.g., Shipyard API) and the query file to be passed dynamically, enabling integration with various systems or future requirements.

Key Features

  1. Authentication:

    • Supports authentication via:
      • A token specific to the GitHub Action workflow.
      • A JWT token for broader compatibility.
  2. API Design:

    • Endpoint: /eox-core/data-api/v1/collect-data/
    • Accepts the following payload:
      • query_file_content: Content of a YAML file containing SQL queries.
      • query_file_url: A URL pointing to a publicly accessible YAML file with queries.
      • destination_url: The URL to which the results will be sent.
    • Ensures only one of query_file_content or query_file_url is required.
  3. Async Task Execution:

    • A Celery async task (generate_report) processes the provided queries:
      • Executes the SQL queries against the edxapp database.
      • Posts the processed results to the provided destination_url.
  4. Validation:

    • Includes a serializer to validate the incoming payload, ensuring the required data is present.
  5. Logging and Error Handling:

    • Logs key events during the execution of the task for better monitoring and debugging.
    • Implements retries for the async task in case of transient failures.

How to Test

  1. Set up an Open edX instance with the eox-core plugin installed.
  2. Authenticate using a valid token.
  3. Use the following payload to test the API:
    {
        "query_file_content": "queries:\n  - name: 'Example Query'\n    query: 'SELECT COUNT(*) FROM auth_user;'",
        "destination_url": "https://example.com/api/v1/receive-data/"
    }
  4. Verify the following:
    • The async task executes the queries and posts the results to the destination URL.
    • Any errors or missing fields result in appropriate error messages.
    • Ensure the task logs provide sufficient information.

Notes

  • The new API does not persist data locally; it dynamically processes and forwards the results.
  • Future enhancements could include:
    • Better query caching or optimization.

@magajh magajh changed the title feat: add API to generate reports feat: add task to generate reports Jan 14, 2025
@magajh magajh force-pushed the maga/implement-task-to-generate-reports branch from 833d135 to fe44aaa Compare January 22, 2025 05:40
@magajh magajh force-pushed the maga/implement-task-to-generate-reports branch from d7ac416 to 12ed527 Compare January 22, 2025 10:44
@magajh magajh changed the title feat: add task to generate reports feat: add Data Collector API Jan 22, 2025
@magajh magajh marked this pull request as ready for review January 22, 2025 11:04
@magajh magajh requested a review from a team as a code owner January 22, 2025 11:04
Copy link
Member

@felipemontoya felipemontoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me there are serious concerns about the security of this feature. We have it on a test environment and I will do all my testing there before I requests changes. However I need to voice the concern that "as is" this must not be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants