-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve dmod.client and update CLI #424
Improve dmod.client and update CLI #424
Conversation
6f55b6d
to
8d29ffe
Compare
8d29ffe
to
5caf4a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @robertbartel! I have a few fairly minor comments that more or less boil down to API parameter choices. There is one bug that will need to be addressed but other than that, this should be pretty easy to get through. Thanks!
async def upload_to_dataset(self, dataset_name: str, paths: List[Path]) -> bool: | ||
chunk_size = 1024 | ||
|
||
with source.open('r') as file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to open this in reading bytes, right? Since we dont know what kind of item is being uploaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm ... yeah, I believe you are correct, and as such we should do similarly for writes as you suggested above. But this actually becomes a larger undertaking. All the links in the chain will need to be modified to accommodate, from DataTransmitMessage.data to the declaration and implementations of _DatasetManager.add_data()` (and associated helper methods).
It probably makes sense to do this as part of #432. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping again here also. This probably relates to the earlier item and my old follow-up questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree. This is a larger undertaking and not directly related so should be in its own PR.
I opened #486 to track changing DataTransmitMessage
's data
field to a more appropriate type. Im not sure if there are other models off the top of my head that will also need to be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't occur to me back in September, but (unless I'm missing something) we can't actually do this because it will break JSON serialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeahhhhh, you are right. I forgot about that. Does that mean you can only upload and download unicode encoded data at this moment? I don't recall seeing any logic to base64 / base85 encode binary data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a sanity check, if we decode and then encode binary data using the "wrong" encoding, but it's the same encoding on both ends, won't we get the same data we started with?
Assuming more discussion and work is needed, I suggest that we defer this problem. A solution will probably be complex, and we can find workarounds as needed to get datasets into the system (worst case: temporarily opening up MinIO externally and uploading directly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a sanity check, if we decode and then encode binary data using the "wrong" encoding, but it's the same encoding on both ends, won't we get the same data we started with?
Yeah! You are right. I may be blowing this out of proportion and it might not be a problem after all. I thought to transmit any data to and from the request service (and therefore the data service) it must be in json and therefore in some unicode flavor (e.g. utf-8). So, for example if I wanted to upload a netCDF forcing file to a dataset, I think I would first need to encode and chunk the data using base64 or base85, package that up in json, and send it along, right? I agree that we should move this discussion elsewhere and just focus on getting the PR through lol. Sorry to trying to derail us 😅!
""" | ||
|
||
def __init__(self, *args, **kwargs): | ||
super().__init__(*args, **kwargs) | ||
#super().__init__(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: could be removed.
5caf4a3
to
f39d89e
Compare
3c08c84
to
a1d10d3
Compare
48cb083
to
0ac3a65
Compare
dca9d6d
to
2b38710
Compare
3c431ff
to
e0c9ed9
Compare
cd66f76
to
c0bec91
Compare
Trying to get this one moving again after it has sat for a while. I think there are some outstanding follow-up questions, and possible tasks resulting from those discussions, but everything else is settled. @aaraney, when you are able, can you confirm all the other conversations can be marked resolved? |
@robertbartel, thanks for revitalizing this! I'd forgotten about it and let it get stale. I've resolved relevant comments and contributed to existing conversation to move this forward. We are getting really close, just a few minor things. |
c0bec91
to
8e6e3bd
Compare
Replacing ngen-job-specific, deprecated client classes with more general JobClient.
Adding more help detail for various subcommands, and renaming some subcommands to be more intuitive.
Replacing pyyaml with Pydantic for ClientConfig changes, and bumping to latest communication package version.
Co-authored-by: Austin Raney <[email protected]>
Fixing bug with misnamed initial variable usage for setting up first message to send before look-ahead sending loop.
Having usage wrap raised exceptions and raise new ones that include the message of the former, rather than crudely pass the name of the user function to DataServiceClient._process_request().
Rename get_dataset_items method in DataServiceClient to get_dataset_item_names.
Fixing mismatched strings between CLI prepared args and supported action strings in DmodClient data_service_action() func.
Commenting out seemingly unneeded SchedulerClient init in RequestService.
Ensuring ConnectionContextClient async_recv() uses an async context manager.
Adding support to package __main__ routine for turning on remote debugging when properly configured.
Re-add DEFAULT_CLIENT_CONFIG_BASENAME definition (that was still being used) and refactor an error message.
Updating Dockerfile for app_server image to create a client config file using the new JSON syntax, complete with optional debugging config if env is set appropriately.
Fixing methods involved in making requests and processing responses to support collections of response types, rather than only a single, to account for scenarios like data transfer.
Updating classes to properly use async context management in transport client when appropriate.
8e6e3bd
to
51f583d
Compare
The only thing different was manually resolving a conflict for a version bump to the communication package, and the result of that ended up the same. As such, I'm going to bypass waiting for another approval this time. |
General improvements to the CLI, as well as updates to bring in line with more recent DMOD updates to serialization, communication client classes, and user config representation.
Note that this PR depends on #409 and should remain in draft status until that is merged.