-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
To Disk Output Plugin #22
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks as expected. good catch on removing the file write from CompassApp.run.
but, this solution could lead to out-of-order writes from multiple threads. i did a quick google search and found this stacko question. i believe the answer has the right idea for us, which is that we need to hold an Arc<Mutex<File>>
in the output plugin and write to that in the plugin.
edit: i'm actually not sure if i'm correct. does fs::writeln! actually manage concurrency? i'm reading around now...
update: yes, there is no guarantee of async locks on file writes, so we will need some kind of solution to prevent concurrent out-of-order writes. that might be the example above, or, maybe also a RwLock<File>
could be a good pick since it handles the lock management under-the-hood.
yeah good catch, the old solution also had the downside of opening the file for each write whereas your proposed solution just opens it once, much better. Just pushed up a new commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work. anecdotally just did a run with 22 queries and a parallelism of 5 and the resulting output file looks correct:
thanks!
brain dump on the defaults
noting that we haven't added this output plugin to the default TOMLs generated by our generate_routee_dataset function in the python API. i think that seems right for that use case, just figured i'd say something in case we should talk about it. i think it's right, but i also think we should plan to add information to our documentation.
i also don't think this is necessarily the best ergonomics for people who want to produce the output file, since the filename is specified in the TOML. it doesn't seem right that the user would need to stop CompassApp each time they want to write to a new output location, modify the TOML, and then re-load. perhaps we stew on this and think of an easier way to let users specify the output file location that doesn't feel like a hack.
Yeah that makes sense to me, I agree that it presents some weirdness ergonomically and so we should mull over some options to make it easier to use.
Yeah I'll take a quick pass at just adding that in now so we at least have something. |
Adds a new output plugin "to_disk" that writes results to an output file rather than returning them and holding them in memory.
Updates all output plugins to return a vector of json values rather than a single value and then flattens the result.
When implementing this, I realized our previous convention of using
"_file"
to find input files to normalize was to ambiguous since we don't want to normalize the new key"output_file"
for this plugin. So, the new proposal is to use the postfix"_input_file"
to denote a file that should be normalized.Closes #19