Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements PythonScriptProcessor #4968

Open
renato-farias opened this issue Dec 17, 2024 · 0 comments
Open

Implements PythonScriptProcessor #4968

renato-farias opened this issue Dec 17, 2024 · 0 comments
Assignees

Comments

@renato-farias
Copy link

Describe the feature you'd like

I would like to propose a new processor implementation, PythonScriptProcessor. This processor should offer the same features as FrameworkProcessor (such as source_dir support and automatic installation of requirements) but without requiring an Estimator. The goal is to enable users to run Python-based parallel processing jobs more flexibly without coupling them to an estimator.

How would this feature be used? Please describe.

This feature would simplify and enhance the experience for users who need to run Python scripts as processing jobs on SageMaker, particularly for use cases that do not require a pre-configured estimator.

Example Use Case:

Running lightweight Python-based data preprocessing or postprocessing scripts on SageMaker Processing jobs.
Performing custom parallel processing tasks, such as batch transformations or distributed computations, where the user’s focus is only on script execution.
The new PythonScriptProcessor would allow users to:

Specify the script directory (source_dir).
Automatically handle dependencies using requirements.txt or other package managers.
Submit a Python script directly for processing without the overhead of setting up an estimator.
This removes the need to over-engineer solutions for straightforward Python tasks and aligns with the existing usability patterns of the SDK.

Describe alternatives you've considered

Custom Implementation: Writing custom implementations for dependency handling and script execution within a processing job is repetitive, error-prone, and not aligned with the SageMaker Python SDK's existing abstraction patterns.

Additional context

The introduction of PythonScriptProcessor would fill the usability gap between ScriptProcessor and FrameworkProcessor, providing a clean and consistent user experience for Python-based processing tasks. It would also reduce friction for users performing data preparation, transformation, or parallel tasks that don’t require model estimators.

This enhancement would make the SDK more intuitive and better suited for broader workflows in SageMaker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants