Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lint option to only output models with score under some threshold #71

Open
lhz opened this issue Aug 14, 2024 · 6 comments · May be fixed by #77
Open

Add lint option to only output models with score under some threshold #71

lhz opened this issue Aug 14, 2024 · 6 comments · May be fixed by #77

Comments

@lhz
Copy link

lhz commented Aug 14, 2024

We are mostly interested in knowing which models are not up to standard, and that can be hard to glimpse when the project has many models and most of them already have a gold badge.

It would be very useful to have an option for the lint command that will cause it to only output models whose score is less than the first threshold (or perhaps more flexible, a custom threshold given as argument to the option.)

Related to this, it would also be useful to output only the rules that were not satisfied, but I'm unsure if that should be a separate option or a direct result of the one suggested.

@lhz lhz changed the title Add lint option to only output models whose score is under some threshold Add lint option to only output models with score under some threshold Aug 14, 2024
@jochemvandooren
Copy link
Contributor

Hey @lhz, thanks for opening this issue! I agree, this is something that needs to be added. We are planning to work on it soon, but you are of course welcome to contribute as well.

I think the most important features to include are:

These two options then can be combined to get: all models with all rules, all models showing only their failing rules, failing models with their failing rules and failing models with all their passing rules. (last option will probably never be used though 😁 )

@japborst
Copy link
Member

As a workaround, you could also achieve this by using --format json and jq.

For a threshold that's e.g. 7.7:

dbt-score lint --format json | jq -r '.models | to_entries[] | select(.value.score < 7.7) | .key

or to get all failing models

dbt-score lint --format json | jq -r '.models | to_entries[] | select(.value.pass|not) | .key

@thomend
Copy link

thomend commented Sep 29, 2024

Hi there :)
I first came across this project at PyData Amsterdam during @matthieucan's talk and immediately liked the project. Would be very happy to support on this issue in the next 1-2 weeks, in case there is a need for it?

With regards to "Only show failing models". I think this could be achieved with something along the lines of:

            ...
            if (
                self.scores[model].value < self._scorer._config.fail_any_model_under
                or not self._scorer._config.only_show_failing
            ):
                self._formatter.model_evaluated(
                    model, self.results[model], self.scores[model]
                )
            ...

I forked the repo and already tried that out quicky (added the --only_show_failing flag in the client). Let me know if you need any support on this and I would be happy to expand this to cover the above described use cases and subsequently create a PR.

@matthieucan
Copy link
Contributor

Hi @thomend, I'm very glad you enjoyed the talk!
And we would love to have this feature, thank you for showing interest.

I believe it might make more sense to implement this logic directly within human_readable_formatter.py? That way the JSON formatter would always receive the event, which seems sensible.

I also think "showing only failing models", and within a model "showing only failing rules" could be implemented together. And I'll argue it should be the default, with a CLI option to "show all". Does that make sense to you?

@thomend
Copy link

thomend commented Oct 2, 2024

Hi @matthieucan thank you for coming back to me. Yes indeed, this makes sense. I have created a draft PR covering this here: #77

Let me know if you agree on the overall approach and I will mark it as "Ready for Review". I followed your outline and only added a --show_all option. I opted to have all violated rules to be shown (even if the model itself is not failing). In my opinion, a user should see all violated rules at all times and not only if a specific model fails. This is important, for instance, if the project is failing while most models still have good model scores (in that case the user is probably not only interested in failing models)

Is that what you had in mind?

@matthieucan
Copy link
Contributor

Amazing @thomend, this is indeed what I meant. Thank you for your PR 💪
While we might want to implement more complex tweaking in the future, I'm very much in favor of starting simple and iterating if needed, so let's go with that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants