Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Auto-refresh materialized view / covering index doesn't ingest all the data within the specified timeframe #789

Open
A-Gray-Cat opened this issue Oct 17, 2024 · 2 comments
Labels
bug Something isn't working Core:MV

Comments

@A-Gray-Cat
Copy link

What is the bug?
When I was creating an auto-refresh materialized view, the statement I used included a where clause that looks for all the logs in the past '1' day. However, I noticed that each refresh didn't ingest logs all the way back within the specified timeframe, and only partial logs were ingested.

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Create a materialized view / covering index and set auto_refresh=true using a statement similar to this:
CREATE MATERIALIZED VIEW testing_auto_mv AS
    SELECT time_dt,
        actor,
        accountid,
        region,
        src_endpoint,
        api,
        http_request,
        is_mfa,
        class_uid,
        resources
     FROM securitylake.amazon_security_lake_glue_db_us_east_1.amazon_security_lake_table_us_east_1_cloud_trail_mgmt_2_0
     WHERE time_dt BETWEEN CURRENT_TIMESTAMP - INTERVAL '1' DAY AND CURRENT_TIMESTAMP
        AND accountid in ('{account id}')
        AND region = 'us-east-1' 
WITH ( auto_refresh = true, refresh_interval = '15 Minute', checkpoint_location = 's3://{bucket name}/AWSLogs/checkpoint_test')
  1. Create a materialized view and set auto_refresh=false using the same statement:
CREATE MATERIALIZED VIEW testing_manual_mv AS
    SELECT time_dt,
        actor,
        accountid,
        region,
        src_endpoint,
        api,
        http_request,
        is_mfa,
        class_uid,
        resources
     FROM securitylake.amazon_security_lake_glue_db_us_east_1.amazon_security_lake_table_us_east_1_cloud_trail_mgmt_2_0
     WHERE time_dt BETWEEN CURRENT_TIMESTAMP - INTERVAL '1' DAY AND CURRENT_TIMESTAMP
        AND accountid in ('{account id}')
        AND region = 'us-east-1' 
WITH ( auto_refresh = false)

And run REFRESH testing_manual_mv

  1. Run the following SQL query against both MVs you just created, and compare the earliest timestamp:
SELECT time_dt FROM flint_index ORDER BY time_dt LIMIT 1

You can also check the difference of number of documents that are ingested in both MVs.

What is the expected behavior?
The auto-refresh MV should ingest all the logs within the specified timeframe

What is your host/environment?

  • OS: [e.g. iOS]
  • Version 2.13
  • Plugins

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
Add any other context about the problem.

@A-Gray-Cat A-Gray-Cat added bug Something isn't working untriaged labels Oct 17, 2024
@dai-chen
Copy link
Collaborator

Could you help quick check the oldest time_dt ingested in MV data? Just wonder if any time zone issue here.

@A-Gray-Cat
Copy link
Author

A-Gray-Cat commented Oct 17, 2024

@dai-chen Yes. The MV was created around 12:09PM PDT, and the oldest time_dt value in the MV is 2024-10-17 13:07:31 in UTC I would assume.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Core:MV
Projects
None yet
Development

No branches or pull requests

2 participants