-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-2059] [Feature] Syntax to restrict selection to the current package #6891
Comments
+1 to this. A very concrete use case is a vim plugin I am writing where I’d like to provide the user with the ability to quickly jump to any model within their project, however dbt ls shows all models including those in packages. The JSON keys don’t provide a great way for differentiating between packaged and internal models. Other vendors may also benefit, for example, when building a dbt integration, being able to display the models for users can be simplified with this option. |
Love it :) and would love to have you contribute this @gwenwindflower! Should this be a special value of the $ dbt ls --select package:this
$ dbt ls --select package:root That's assuming, of course, that no one names their project/package |
The code for the dbt-core/core/dbt/graph/selector_methods.py Lines 295 to 300 in ccb4fa2
We don't have access to the full |
Jumping on the brainstorming bandwagon, here's some ideas for names (some taken from this 😉 ):
If needed/desired, we could add some special symbol like
(I'm guessing we wouldn't actually use Alternatively, we could introduce some other selection shorthand that means the same thing as
It would be nice if it were somehow impossible to not collide with a project name! We could restrict the possible names to ones that wouldn't be valid project names anyways. Or we could make our choice a reserved keyword and disallow projects from taking on that name (which would be a breaking change). |
Something to keep in mind during any implementation: #6598 / #6599 (not reviewed/merged yet) adds the ability to use Unix-style
while still retaining these:
So we'd want the two features and PRs to be compatible with each other. |
at first blush i think
|
I won't lie, this did pique my interest:
But I'm not convinced that shorter & cleverer is better here, versus something explicit & easily understandable. |
I do wonder if maybe making a package boolean available as part of the JSON output might be a more appropriate solution? Some type of “internal_package: true” key/value could then be parsed by the user through jq or a programming language as needed. Including the package name could also be very helpful so clients don’t have to also parse the dbt_project.yml file. In an ideal world, dbt ls —output=json would have two additional keys:
This doesn’t prevent also the work described above for selecting internal packages but could be a useful addition for integrations and clients. |
(@PedramNavid What time is it where you are?) There is already the ability to include the $ dbt ls --output json --output-keys name,resource_type,package_name
{"name": "my_model", "resource_type": "model", "package_name": "test"}
{"name": "my_seed", "resource_type": "seed", "package_name": "test"} (The CLI syntax of So if you know your internal/root package's name, it's easy enough to filter on that already. I also see how it would be handy to have that as a (new) boolean flag. |
I’m in Singapore right now on an 11 hour layover, only 7pm! Been traveling through many time zones so time is in perpetual flux. Ah, package name is handy, then either a way to get the package name from dbt might almost be enough to solve this (for me anyway). Maybe just the dbt project name as an output key, then I could easily filter for models where project name==package name? |
Bon voyage ! So, today, the options would be:
Future (!!)just for fun, while you're laying over We're working on a Python API for Does this excite you? Terrify you? Genuinely curious. # my-dbt-project-dir/script.py
import json
import pprint
from dbt.cli.main import dbtRunner
from dbt.contracts.graph.manifest import Manifest
manifest: Manifest = None
success: bool = None
list_results: list[str]
# TODO: disable other stdout logging
dbt = dbtRunner()
manifest, success = dbt.invoke(["parse"])
root_project_name = manifest.metadata.project_name
# at this point, you could just loop over manifest.nodes
# but let's imagine you want to use 'dbt list'
list_results, success = dbt.invoke(["list", "--output", "json", "--output-keys", "name,resource_type,package_name"])
results_json = [json.loads(res) for res in list_results]
filtered = [res for res in results_json if res["package_name"] == root_project_name]
pp = pprint.PrettyPrinter()
pp.pprint(filtered)
|
i have a feeling our boy P-Money is using lua for his nvim plug so a python api might not do the trick @PedramNavid for present day constraints, might i suggest using yq instead (might already be on your radar but it was new to me last week) -- makes parsing YAML very jq like and it handles a bunch of formats! |
true! that would be an invalid project name right? or we could safely make it one? i do feel icky about the possibility of some company called "This Cosmetics" making a project called "this", and i also don't like the idea of outlawing 'this' as a project name. |
I love the Python API, but Winnie has it right, my integration is lua so I’m dependent on the command line. I’ve also seen integrations built using JS/TypeScript so being able to access more detailed information via stdout is a bit more flexible. Both jq/yq are great solutions but hard for me to use for a project i want to distribute widely. Fortunately vim comes with a json parser and I can use grep/sed to extract the project name from the dbt_project.yml so I’m not blocked here at all! |
mmm yea right right, i ran into this problem with some codegen stuff i was building (re the portability of jq/yq). i don't like the boolean as it feels a lot for not much info, but i wouldn't hate the idea of a i do want the CLI to have this functionality regardless of the specific tooling use case. |
WIP branch https://github.com/dbt-labs/dbt-core/tree/winnie/this-package-selector trying to figure out how to get the newly added appreciate if anybody has any ideas or context on that though! once that's working i think the rest of it is a pleasant cruise. |
@jtcohen6 i think i'm hitting a wall, from what i can understand in my very, very limited understanding -- i think |
@gwenwindflower I was running into that too! (While testing the commit I linked above.) Confusing, but—I figured out it was a thing with partial parsing (the existing manifest doesn't have the |
Hi @gwenwindflower & @jtcohen6, this feature request seems to be a bit stale, but I have raised recently a similar issue (#8954) and @dbeatty10 redirected me here, so I wanted to bump it with a few comments. As I can see, the core discussion here is about the proper syntax for selecting nodes from the current package. That's definitely a step forward, but I wanted to put on a table a bit different approach:
I see two main benefits of such approach:
This way you can run simple commands like That's also related to another point raised in the above issue:
As @dbeatty10 replied there - currently model access modifiers are only applied for If that proposal doesn't resonate with you, I have two more ideas:
|
If anyone wants to implement the narrowly version of this issue as-is, winnie had a WIP branch here that you could take a peek at and create your own branch from. Most of the diff is reformatting import statements. From a quick peek, these are the sections with changed logic: |
@barton996 Thank you for opening the PR! @aranke and I started reviewing today, and we wanted to revisit the question of whether this syntax should be:
After talking it through, we like
@barton996 I will leave a comment on the PR suggesting this syntax change, if it makes sense to you! |
@jtcohen6 quite an edge-case, but... in that scenario you would need to restrict I would personally vote for |
@jaklan Definitely considered (in comments above) - I think the added clarity outweighs the edge case (risk of collision with installed package named For the ideas further up -
|
@jtcohen6 thanks for the link.
Well, that's a bit disappointing, especially because of the reason I mentioned above:
With no way to enforce |
Is this your first time submitting a feature request?
Describe the feature
There are many use cases for not running the models in your packages.
dbt ls
of the models in the project to an external toolUnfortunately there's no easy way that I'm aware of that you can filter to just the models in the project without hardcoding the selects or excludes to various projects. Particularly as the mesh grows, I can see this becoming more of an issue, and a desire to 'just run this project's models' becoming really helpful.
I would love to implement something like:
dbt build --select this.*
or perhapsdbt ls --exclude external
ordbt run --internal-only
/dbt run -i
-- i haven't thought of the ideal syntax, but hopefully you get the idea.Describe alternatives you've considered
You can exclude all the packages by name in YAML selectors, or you can
[project_name].*
but this isn't very convenient, particularly when working programmatically on top of the CLI, such as with editor extensions that would need to parse the project YAML to get the package name.Who will this benefit?
This will primarily benefit people using lots of Fivetran packages, a multi-project mesh structure, or some other configuration that involves lots of models contributed by packages that you don't necessarily want to constantly re-run when you're developing.
Are you interested in contributing this feature?
Yes! Very much. Dying to contribute to the CLI in some way.
Anything else?
No response
The text was updated successfully, but these errors were encountered: