Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SL-2777] Reproduce error with conversion metric and missing group by #1492

Open
Jstein77 opened this issue Oct 30, 2024 · 21 comments
Open

[SL-2777] Reproduce error with conversion metric and missing group by #1492

Jstein77 opened this issue Oct 30, 2024 · 21 comments
Assignees
Labels
Low priority Created by Linear-GitHub Sync Medium Created by Linear-GitHub Sync Metricflow Created by Linear-GitHub Sync
Milestone

Comments

@Jstein77
Copy link
Contributor

@courtney.holcomb said:

We've been seeing this error come up in the logs a lot lately, and I was finally able to repro. It's triggered by querying a conversion metric with a where filter but with no group by. The where filter is only applied to one of the measures, so then we aren't able to join the two aggregated measures. I'm not super familiar with conversion metric behavior. What do we want the experience to be in that scenario to fix this bug? cc @willymwai.deng @jordan.stein

[August 21st, 2024 11:30 AM] diego.fernandez: Hey all, we've seen this error a few times in MFS recently:
All parent nodes should have the same set of linkable instances since all values are coalesced.
They all seem to come from the same account, so maybe it's a configuration error on their side, but it shows up as errors on our side <https://dbtlabsmt.datadoghq.com/logs?query=%40extra.job.error.message%3A"All parent nodes should have the same set of linkable instances since all values are coalesced."&agg_m=count&agg_m_source=base&agg_t=count&cols=host%2Cservice&context_event=AZFy6QYvAAC7NuVb965U8ACb&fromUser=false&graphType=flamegraph&historicalData=true&hostGroup=*&lens=Performance&messageDisplay=inline&page=1&panelFrom=1724209200000&panelStorageType=hot&panelTo=1724210400...

From SyncLinear.com | SL-2777

@Jstein77 Jstein77 added Low priority Created by Linear-GitHub Sync Medium Created by Linear-GitHub Sync Metricflow Created by Linear-GitHub Sync labels Oct 30, 2024
@Jstein77
Copy link
Contributor Author

As a follow up:

  • Add validators when someone adds a filter on a conversion measure.

@Jstein77
Copy link
Contributor Author

we do not have anything that tracks what type of metric is being used. We could see if it gets logged so that we can make that calculation on DD

@Jstein77
Copy link
Contributor Author

Do we have any visibility into how often conversion metrics are being queried, and what % hit this error? I checked in our snowflake tables but didn't see anything. I'm trying to prioritize this bug.

@Jstein77
Copy link
Contributor Author

Commented on the PR. I don't know if we can quick-fix this, but if you've got ideas I'd be happy to be wrong about that.

@Jstein77
Copy link
Contributor Author

That error isn't happening in the pushdown optimizer, it's happening on plan conversion. A query of this form apparently doesn't convert to a sql query plan:

    parsed_query = query_parser.parse_and_validate_query(
        metric_names=("visit_buy_conversion_rate_7days",),
        group_by_names=("metric_time", "user__home_state_latest"),
        where_constraint=PydanticWhereFilter(where_sql_template="{{ Dimension('visit__referrer_id') }} = '123456'"),
    )

@Jstein77
Copy link
Contributor Author

Hmm, I made that change and it seems to break the predicate pushdown tests, but other tests seems to be fine

FAILED tests_metricflow/query_rendering/test_predicate_pushdown_rendering.py::test_conversion_metric_query_filters - RuntimeError: Expected exactly one matching instance for LinklessEntitySpec(element_name='user', entity_links=()) in instance set, but found: []. All entity instances: ()

@tomkit.lento mind lending a hand? 👀

@Jstein77
Copy link
Contributor Author

This only addresses the bug from Courtney's post and also the issue 1210, but 1199 is a separate issue

@Jstein77
Copy link
Contributor Author

Cause this base_measure_recipe.required_local_linkable_specs contains the specs needed to do the joins to get the converted events, but it contains the specs from the where filter too since it was built via __get_required_and_extraneous_linkable_specs and in the final aggregate measures node, we should just use the queried_specs as we don't want any of the specs from the filter to be there which is causing that bug

@Jstein77
Copy link
Contributor Author

So trying to remember the context of all this again, but it seems like you can repro this by filtering with a dim that exists in the base measure's semantic model, but that dimension isn't in the group by? I'm looking at the error, it seems like that error is coming from when we cross join the agg'd base measure set and the base measure set filtered by only converted where the base measure set filtered by converted rows
seems to have the linkable element (ie., the dimension) so it's erroring out during the cross join rendering since it should not have the spec from the filter.

I did some testing, it seems like this fixes that bug? #1381

@Jstein77
Copy link
Contributor Author

More details here:

#1199

@Jstein77
Copy link
Contributor Author

@jordan.stein no, we would not.

@Jstein77
Copy link
Contributor Author

Taking a look 👀

@Jstein77
Copy link
Contributor Author

I thought if we filter on the base measure set, then join in the conversion set we should only be matching users in the base measure set so wouldn't we be safe from false conversion matches?

@Jstein77
Copy link
Contributor Author

I can also repro if i run dbt sl query --metrics mql_to_seller_conversion_rate_base --where "{{Dimension('mql__origin')}} = 'direct_traffic'". In this case mql__origin is from the same semantic model as the base measure

@Jstein77
Copy link
Contributor Author

If you only apply to the filter to the base measure you may get inappropriate conversion matches.

@Jstein77
Copy link
Contributor Author

I feel like that was intentional but maybe I'm misremembering? @willymwai.deng would know

@Jstein77
Copy link
Contributor Author

I think we should apply the filter to the base measure, similar to how to do it for categorical dimensions. i.e The filter only gets applied here to limit the base measure set.

 (
    SELECT
      metric_time__day
      , SUM(mqls) AS mqls
    FROM (
      SELECT
        DATE_TRUNC('day', first_contact_date) AS metric_time__day
        , CASE WHEN mql_id IS NOT NULL THEN 1 ELSE 0 END AS mqls
      FROM ANALYTICS.dbt_jstein.olist_marketing_qualified_leads olist_mqls_src_10000
    ) subq_2
    WHERE metric_time__day > '2010-01-01'
    GROUP BY
      metric_time__day
  ) subq_4

@Jstein77
Copy link
Contributor Author

Yep that's what I would expect

@Jstein77
Copy link
Contributor Author

I get the error when filtering for metric time

Encountered an error error querying against the semantic layer: All join data sets should have the same set of linkable instances as the from dataset since all values are coalesced.
From dataset instance set: InstanceSet(measure_instances=(MeasureInstance(defined_from=(SemanticModelElementReference(semantic_model_name='olist_mqls', element_name='mqls'),), associated_columns=(ColumnAssociation(column_name='mqls', single_column_correlation_key=SingleColumnCorrelationKey(PYDANTIC_BUG_WORKAROUND=True)),), spec=MeasureSpec(element_name='mqls', non_additive_dimension_spec=None, fill_nulls_with=None), aggregation_state=AggregationState.COMPLETE),), dimension_instances=(), time_dimension_instances=(), entity_instances=(), group_by_metric_instances=(), metric_instances=(), metadata_instances=())
Join dataset instance sets: [InstanceSet(measure_instances=(MeasureInstance(defined_from=(SemanticModelElementReference(semantic_model_name='olist_closed_deals', element_name='sellers'),), associated_columns=(ColumnAssociation(column_name='sellers', single_column_correlation_key=SingleColumnCorrelationKey(PYDANTIC_BUG_WORKAROUND=True)),), spec=MeasureSpec(element_name='sellers', non_additive_dimension_spec=None, fill_nulls_with=None), aggregation_state=AggregationState.COMPLETE),), dimension_instances=(), time_dimension_instances=(TimeDimensionInstance(defined_from=(SemanticModelElementReference(semantic_model_name='olist_mqls', element_name='ds'),), associated_columns=(ColumnAssociation(column_name='metric_time__day', single_column_correlation_key=SingleColumnCorrelationKey(PYDANTIC_BUG_WORKAROUND=True)),), spec=TimeDimensionSpec(element_name='metric_time', entity_links=(), time_granularity=TimeGranularity.DAY, date_part=None, aggregation_state=None)),), entity_instances=(), group_by_metric_instances=(), metric_instances=(), metadata_instances=())]

Polling for queried result set
(cloud-env) jordanstein@Jordan-Stein jaffle-sl-template % dbt sl query --metrics mql_to_seller_conversion_rate_base --where "{{TimeDimension('metric_time')}} > '2010-01-01'"

@Jstein77
Copy link
Contributor Author

Actually, wait, I’m thinking of something else. It is a bit odd that it’s only time filters.

But the linked GH issue has a repro case in it, and that uses a metric_time filter as well.

@Jstein77
Copy link
Contributor Author

It should only happen with metric_time, for Reasons

@Jstein77 Jstein77 added this to the v0.207.0 milestone Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Low priority Created by Linear-GitHub Sync Medium Created by Linear-GitHub Sync Metricflow Created by Linear-GitHub Sync
Projects
None yet
Development

No branches or pull requests

2 participants