Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To Disk Output Plugin #22

Merged
merged 13 commits into from
Nov 9, 2023
Merged

To Disk Output Plugin #22

merged 13 commits into from
Nov 9, 2023

Conversation

nreinicke
Copy link
Collaborator

Adds a new output plugin "to_disk" that writes results to an output file rather than returning them and holding them in memory.

Updates all output plugins to return a vector of json values rather than a single value and then flattens the result.

When implementing this, I realized our previous convention of using "_file" to find input files to normalize was to ambiguous since we don't want to normalize the new key "output_file" for this plugin. So, the new proposal is to use the postfix "_input_file" to denote a file that should be normalized.

Closes #19

Copy link
Collaborator

@robfitzgerald robfitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks as expected. good catch on removing the file write from CompassApp.run.

but, this solution could lead to out-of-order writes from multiple threads. i did a quick google search and found this stacko question. i believe the answer has the right idea for us, which is that we need to hold an Arc<Mutex<File>> in the output plugin and write to that in the plugin.

edit: i'm actually not sure if i'm correct. does fs::writeln! actually manage concurrency? i'm reading around now...

update: yes, there is no guarantee of async locks on file writes, so we will need some kind of solution to prevent concurrent out-of-order writes. that might be the example above, or, maybe also a RwLock<File> could be a good pick since it handles the lock management under-the-hood.

@nreinicke
Copy link
Collaborator Author

this solution could lead to out-of-order writes from multiple threads.

yeah good catch, the old solution also had the downside of opening the file for each write whereas your proposed solution just opens it once, much better. Just pushed up a new commit.

Copy link
Collaborator

@robfitzgerald robfitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work. anecdotally just did a run with 22 queries and a parallelism of 5 and the resulting output file looks correct:

discy-disc.json

thanks!

brain dump on the defaults

noting that we haven't added this output plugin to the default TOMLs generated by our generate_routee_dataset function in the python API. i think that seems right for that use case, just figured i'd say something in case we should talk about it. i think it's right, but i also think we should plan to add information to our documentation.

i also don't think this is necessarily the best ergonomics for people who want to produce the output file, since the filename is specified in the TOML. it doesn't seem right that the user would need to stop CompassApp each time they want to write to a new output location, modify the TOML, and then re-load. perhaps we stew on this and think of an easier way to let users specify the output file location that doesn't feel like a hack.

@nreinicke
Copy link
Collaborator Author

perhaps we stew on this and think of an easier way to let users specify the output file location that doesn't feel like a hack.

Yeah that makes sense to me, I agree that it presents some weirdness ergonomically and so we should mull over some options to make it easier to use.

i also think we should plan to add information to our documentation.

Yeah I'll take a quick pass at just adding that in now so we at least have something.

@nreinicke nreinicke merged commit 116133e into main Nov 9, 2023
4 checks passed
@nreinicke nreinicke deleted the ndr/batch-run branch November 9, 2023 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CompassApp option to write responses to file system
2 participants