-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chaining data loaders #332
Comments
From #325 (comment):
|
I've been working on a small project to get some first hand experience with the CLI. In it I'm downloading a zip file from some hobbyist website, extracting couple dozen text files, and then running them through a custom parser. Parsing them takes about 30 seconds, which is a bit longer than I want to do in the markdown file. I'm iterating on the parser itself, so I'm re-running it every few minutes. The options I see for my case are:
From this, I have two wish list items. One is chained data loaders, the other is incremental data loaders that can somehow stream their results in to the client. I don't really know how that would work, and it's probably better suited for Notebooks anyways. I can sympathize with wanting data loaders to be in any language, but it is very jarring to go from writing my code in a JS fenced code block script and having it work easily, to working in a Perhaps once we have the server-based dataloader workflow that Mike mentioned, we could then wrap that in a FileAttachment facade that makes it feel just like it does in Markdown files. |
@mythmon FileAttachment supports streaming, see https://observablehq.com/@observablehq/streaming-shapefiles |
an example of the use case in the chess bump chart example. any changes to the data transformation in the data loader require a full download of all of the data. |
This seems to point to something similar to having a dependency graph, like in the targets 📦 in R. |
Tangentially related to #918. |
I'm not seeing this implemented in the chess bump example. Am I missing something? |
It's not implemented in the chess bump example. The example is a case where implementing this feature would improve the data loader(s), if Framework gained this features. |
Gotcha. Here's my use case, for anyone interested. I'd like Data Loader 1 to be a Python script that downloads a dataframe from s3, transforms the data with filter-y tricks and then writes out a JSON file that's ready to serve. Then Data Loader 2 would be a Node.JS script that would open that very large JSON file, build a D3 graphic in a canvas object, and then write out a PNG file that could be ultimately served by the static site. |
Suppose we want to 1. download a dataset from an API and 2. analyze it. Currently a data loader must do both at the same time, and will run the download part again if we update the analysis code.
Ideally we'd want to separate this into two chained data loaders, that would still be live, i.e. if a page relies on 2 that relies on 1, editing 1 would tell the page to require the analysis again, which would trigger a new download. But editing 2 would only run the analysis again, not the download.
This would also make it easier to generate several files from a common (and slow) download.
The text was updated successfully, but these errors were encountered: