Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add project parameter #7

Open
crusaderky opened this issue Jul 16, 2021 · 2 comments
Open

Add project parameter #7

crusaderky opened this issue Jul 16, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@crusaderky
Copy link
Contributor

crusaderky commented Jul 16, 2021

Very frequently, a user will want to drop some of the keys in the mongodb documents, server side, before they are loaded.

Please add a project: dict[str, Any] = None parameter to read_mongo.
The project must be applied after the two matches. Please include a test that would fail if it were the other way around.

@ncclementi ncclementi added the enhancement New feature or request label Oct 12, 2021
@joej
Copy link

joej commented Sep 19, 2023

?

@ShaneHarvey
Copy link

ShaneHarvey commented Sep 19, 2023

To add some more context, @crusaderky is asking for the ability to configure the $project aggregation stage to limit the amount of data that dask needs to load: https://www.mongodb.com/docs/manual/reference/operator/aggregation/project/

One workaround for now would be to create a view (https://www.mongodb.com/docs/manual/core/views/) that applies the projection and then call read_mongo on the view:

db.create_collection("collProjected", viewOn="coll", pipeline=[{"$project": {"a": 1, "b": 1}])
res = read_mongo("db", "collProjected", ...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants