Add lint option to only output models with score under some threshold #71

lhz · 2024-08-14T08:43:57Z

We are mostly interested in knowing which models are not up to standard, and that can be hard to glimpse when the project has many models and most of them already have a gold badge.

It would be very useful to have an option for the lint command that will cause it to only output models whose score is less than the first threshold (or perhaps more flexible, a custom threshold given as argument to the option.)

Related to this, it would also be useful to output only the rules that were not satisfied, but I'm unsure if that should be a separate option or a direct result of the one suggested.

The text was updated successfully, but these errors were encountered:

jochemvandooren · 2024-08-14T10:07:09Z

Hey @lhz, thanks for opening this issue! I agree, this is something that needs to be added. We are planning to work on it soon, but you are of course welcome to contribute as well.

I think the most important features to include are:

Only show failing models: my idea would be to use the configuration variable fail_any_model_under: https://dbt-score.picnic.tech/configuration/#main-configuration. Do you think that would work for your use-case as well?
Only show violated rules

These two options then can be combined to get: all models with all rules, all models showing only their failing rules, failing models with their failing rules and failing models with all their passing rules. (last option will probably never be used though 😁 )

japborst · 2024-08-22T19:11:23Z

As a workaround, you could also achieve this by using --format json and jq.

For a threshold that's e.g. 7.7:

dbt-score lint --format json | jq -r '.models | to_entries[] | select(.value.score < 7.7) | .key

or to get all failing models

dbt-score lint --format json | jq -r '.models | to_entries[] | select(.value.pass|not) | .key

thomend · 2024-09-29T15:01:34Z

Hi there :)
I first came across this project at PyData Amsterdam during @matthieucan's talk and immediately liked the project. Would be very happy to support on this issue in the next 1-2 weeks, in case there is a need for it?

With regards to "Only show failing models". I think this could be achieved with something along the lines of:

            ...
            if (
                self.scores[model].value < self._scorer._config.fail_any_model_under
                or not self._scorer._config.only_show_failing
            ):
                self._formatter.model_evaluated(
                    model, self.results[model], self.scores[model]
                )
            ...

I forked the repo and already tried that out quicky (added the --only_show_failing flag in the client). Let me know if you need any support on this and I would be happy to expand this to cover the above described use cases and subsequently create a PR.

matthieucan · 2024-09-30T12:52:23Z

Hi @thomend, I'm very glad you enjoyed the talk!
And we would love to have this feature, thank you for showing interest.

I believe it might make more sense to implement this logic directly within human_readable_formatter.py? That way the JSON formatter would always receive the event, which seems sensible.

I also think "showing only failing models", and within a model "showing only failing rules" could be implemented together. And I'll argue it should be the default, with a CLI option to "show all". Does that make sense to you?

thomend · 2024-10-02T06:22:52Z

Hi @matthieucan thank you for coming back to me. Yes indeed, this makes sense. I have created a draft PR covering this here: #77

Let me know if you agree on the overall approach and I will mark it as "Ready for Review". I followed your outline and only added a --show_all option. I opted to have all violated rules to be shown (even if the model itself is not failing). In my opinion, a user should see all violated rules at all times and not only if a specific model fails. This is important, for instance, if the project is failing while most models still have good model scores (in that case the user is probably not only interested in failing models)

Is that what you had in mind?

matthieucan · 2024-10-03T14:23:33Z

Amazing @thomend, this is indeed what I meant. Thank you for your PR 💪
While we might want to implement more complex tweaking in the future, I'm very much in favor of starting simple and iterating if needed, so let's go with that!

lhz changed the title ~~Add lint option to only output models whose score is under some threshold~~ Add lint option to only output models with score under some threshold Aug 14, 2024

thomend linked a pull request Oct 2, 2024 that will close this issue

Only output failing models and violated rules per default in HumanReadableFormatter #77

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lint option to only output models with score under some threshold #71

Add lint option to only output models with score under some threshold #71

lhz commented Aug 14, 2024 •

edited

Loading

jochemvandooren commented Aug 14, 2024

japborst commented Aug 22, 2024

thomend commented Sep 29, 2024 •

edited

Loading

matthieucan commented Sep 30, 2024

thomend commented Oct 2, 2024 •

edited

Loading

matthieucan commented Oct 3, 2024

Add lint option to only output models with score under some threshold #71

Add lint option to only output models with score under some threshold #71

Comments

lhz commented Aug 14, 2024 • edited Loading

jochemvandooren commented Aug 14, 2024

japborst commented Aug 22, 2024

thomend commented Sep 29, 2024 • edited Loading

matthieucan commented Sep 30, 2024

thomend commented Oct 2, 2024 • edited Loading

matthieucan commented Oct 3, 2024

lhz commented Aug 14, 2024 •

edited

Loading

thomend commented Sep 29, 2024 •

edited

Loading

thomend commented Oct 2, 2024 •

edited

Loading