You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the feature you'd like
Support for the MIME type parquet files in the sagemaker toolkit. E.g. in the README of this repo, there is an example default_input_fn():
def default_input_fn(self, input_data, content_type, context=None):
"""A default input_fn that can handle JSON, CSV and NPZ formats.
Args:
input_data: the request payload serialized in the content_type format
content_type: the request content_type
context (obj): the request context (default: None).
Returns: input_data deserialized into torch.FloatTensor or torch.cuda.FloatTensor depending if cuda is available.
"""
return decoder.decode(input_data, content_type)
Looking into decoder.decode, I see the following MIME types are supported:
Should not be too hard to add parquet here. Parquet is a dat file commonly used with large datasets and also supported in other sagemaker services, for example in Autopilot.
How would this feature be used? Please describe.
Reduce storage costs, data I/O costs, increase speed while processing.
Describe alternatives you've considered
CSV is the standard, but it's a much less efficient way to store, read and write column-oriented data.
Additional context
The text was updated successfully, but these errors were encountered:
Describe the feature you'd like
Support for the MIME type parquet files in the sagemaker toolkit. E.g. in the README of this repo, there is an example
default_input_fn()
:Looking into
decoder.decode
, I see the following MIME types are supported:Should not be too hard to add
parquet
here. Parquet is a dat file commonly used with large datasets and also supported in other sagemaker services, for example in Autopilot.How would this feature be used? Please describe.
Reduce storage costs, data I/O costs, increase speed while processing.
Describe alternatives you've considered
CSV is the standard, but it's a much less efficient way to store, read and write column-oriented data.
Additional context
The text was updated successfully, but these errors were encountered: