You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to propose a new processor implementation, PythonScriptProcessor. This processor should offer the same features as FrameworkProcessor (such as source_dir support and automatic installation of requirements) but without requiring an Estimator. The goal is to enable users to run Python-based parallel processing jobs more flexibly without coupling them to an estimator.
How would this feature be used? Please describe.
This feature would simplify and enhance the experience for users who need to run Python scripts as processing jobs on SageMaker, particularly for use cases that do not require a pre-configured estimator.
Example Use Case:
Running lightweight Python-based data preprocessing or postprocessing scripts on SageMaker Processing jobs.
Performing custom parallel processing tasks, such as batch transformations or distributed computations, where the user’s focus is only on script execution.
The new PythonScriptProcessor would allow users to:
Specify the script directory (source_dir).
Automatically handle dependencies using requirements.txt or other package managers.
Submit a Python script directly for processing without the overhead of setting up an estimator.
This removes the need to over-engineer solutions for straightforward Python tasks and aligns with the existing usability patterns of the SDK.
Describe alternatives you've considered
Custom Implementation: Writing custom implementations for dependency handling and script execution within a processing job is repetitive, error-prone, and not aligned with the SageMaker Python SDK's existing abstraction patterns.
Additional context
The introduction of PythonScriptProcessor would fill the usability gap between ScriptProcessor and FrameworkProcessor, providing a clean and consistent user experience for Python-based processing tasks. It would also reduce friction for users performing data preparation, transformation, or parallel tasks that don’t require model estimators.
This enhancement would make the SDK more intuitive and better suited for broader workflows in SageMaker.
The text was updated successfully, but these errors were encountered:
Describe the feature you'd like
I would like to propose a new processor implementation, PythonScriptProcessor. This processor should offer the same features as FrameworkProcessor (such as source_dir support and automatic installation of requirements) but without requiring an Estimator. The goal is to enable users to run Python-based parallel processing jobs more flexibly without coupling them to an estimator.
How would this feature be used? Please describe.
This feature would simplify and enhance the experience for users who need to run Python scripts as processing jobs on SageMaker, particularly for use cases that do not require a pre-configured estimator.
Example Use Case:
Running lightweight Python-based data preprocessing or postprocessing scripts on SageMaker Processing jobs.
Performing custom parallel processing tasks, such as batch transformations or distributed computations, where the user’s focus is only on script execution.
The new PythonScriptProcessor would allow users to:
Specify the script directory (source_dir).
Automatically handle dependencies using requirements.txt or other package managers.
Submit a Python script directly for processing without the overhead of setting up an estimator.
This removes the need to over-engineer solutions for straightforward Python tasks and aligns with the existing usability patterns of the SDK.
Describe alternatives you've considered
Custom Implementation: Writing custom implementations for dependency handling and script execution within a processing job is repetitive, error-prone, and not aligned with the SageMaker Python SDK's existing abstraction patterns.
Additional context
The introduction of PythonScriptProcessor would fill the usability gap between ScriptProcessor and FrameworkProcessor, providing a clean and consistent user experience for Python-based processing tasks. It would also reduce friction for users performing data preparation, transformation, or parallel tasks that don’t require model estimators.
This enhancement would make the SDK more intuitive and better suited for broader workflows in SageMaker.
The text was updated successfully, but these errors were encountered: