Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Set workers through env variable, improved defaults #15

Merged
merged 3 commits into from
Jul 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 32 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,34 +39,46 @@
uvicorn fastmlx:app --reload --workers 0
```

### Running with Multiple Workers (Parallel Processing)
> [!WARNING]
> The `--reload` flag should not be used in production. It is only intended for development purposes.

For improved performance and parallel processing capabilities, you can specify the number of worker processes. This is particularly useful for handling multiple requests simultaneously.
### Running with Multiple Workers (Parallel Processing)

```bash
fastmlx --workers 4
```
or
For improved performance and parallel processing capabilities, you can specify either the absolute number of worker processes or the fraction of CPU cores to use. This is particularly useful for handling multiple requests simultaneously.

```bash
uvicorn fastmlx:app --workers 4
```
You can also set the `FASTMLX_NUM_WORKERS` environment variable to specify the number of workers or the fraction of CPU cores to use. `workers` defaults to 2 if not passed explicitly or set via the environment variable.

Replace `4` with the desired number of worker processes. The optimal number depends on your system's resources and the specific requirements of your application.
In order of precedence (highest to lowest), the number of workers is determined by the following:
- Explicitly passed as a command-line argument
- `--workers 4` will set the number of workers to 4
- `--workers 0.5` will set the number of workers to half the number of CPU cores available (minimum of 1)
- Set via the `FASTMLX_NUM_WORKERS` environment variable
- Default value of 2

Notes:
- The `--reload` flag is useful during development as it automatically reloads the server when code changes are detected. However, it should not be used in production.
- When using multiple workers, the `--reload` flag is not compatible and should be omitted.
- The number of workers should typically not exceed the number of CPU cores available on your machine for optimal performance.
To use all available CPU cores, set the value to **1.0**.

### Considerations for Multi-Worker Setup
Example:
```bash
fastmlx --workers 4
```
or

```bash
uvicorn fastmlx:app --workers 4
```

> [!NOTE]
> - `--reload` flag is not compatible with multiple workers
> - The number of workers should typically not exceed the number of CPU cores available on your machine for optimal performance.

### Considerations for Multi-Worker Setup

1. **Stateless Application**: Ensure your FastMLX application is stateless, as each worker process operates independently.
2. **Database Connections**: If your app uses a database, make sure your connection pooling is configured to handle multiple workers.
3. **Resource Usage**: Monitor your system's resource usage to find the optimal number of workers for your specific hardware and application needs. Additionally, you can remove any unused models using the delete model endpoint.
4. **Load Balancing**: When running with multiple workers, incoming requests are automatically load-balanced across the worker processes.
1. **Stateless Application**: Ensure your FastMLX application is stateless, as each worker process operates independently.
2. **Database Connections**: If your app uses a database, make sure your connection pooling is configured to handle multiple workers.
3. **Resource Usage**: Monitor your system's resource usage to find the optimal number of workers for your specific hardware and application needs. Additionally, you can remove any unused models using the delete model endpoint.
4. **Load Balancing**: When running with multiple workers, incoming requests are automatically load-balanced across the worker processes.

By leveraging multiple workers, you can significantly improve the throughput and responsiveness of your FastMLX application, especially under high load conditions.
By leveraging multiple workers, you can significantly improve the throughput and responsiveness of your FastMLX application, especially under high load conditions.

3. **Making API Calls**

Expand Down
44 changes: 43 additions & 1 deletion fastmlx/fastmlx.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,27 @@ class ChatCompletionResponse(BaseModel):
app = FastAPI()


# Custom type function
def int_or_float(value):
Blaizzy marked this conversation as resolved.
Show resolved Hide resolved
try:
return int(value)
except ValueError:
try:
return float(value)
except ValueError:
raise argparse.ArgumentTypeError(f"{value} is not an int or float")


def calculate_default_workers(workers: int = 2) -> int:
"""Calculate the default number of workers based on environment variable."""
if num_workers_env := os.getenv("FASTMLX_NUM_WORKERS"):
try:
workers = int(num_workers_env)
except ValueError:
workers = max(1, int(os.cpu_count() * float(num_workers_env)))
return workers


# Add CORS middleware
def setup_cors(app: FastAPI, allowed_origins: List[str]):
app.add_middleware(
Expand Down Expand Up @@ -274,9 +295,30 @@ def run():
default=False,
help="Enable auto-reload of the server. Only works when 'workers' is set to None.",
)
parser.add_argument("--workers", type=int, default=2, help="Number of workers")

parser.add_argument(
"--workers",
type=int_or_float,
default=calculate_default_workers,
help="""Number of workers. Overrides the `FASTMLX_NUM_WORKERS` env variable.
Can be either an int or a float.
If an int, it will be the number of workers to use.
If a float, number of workers will be this fraction of the number of CPU cores available, with a minimum of 1.
Defaults to the `FASTMLX_NUM_WORKERS` env variable if set and to 2 if not.
To use all available CPU cores, set it to 1.0.

Examples:
--workers 1 (will use 1 worker)
--workers 1.0 (will use all available CPU cores)
--workers 0.5 (will use half the number of CPU cores available)
--workers 0.0 (will use 1 worker)""",
)

args = parser.parse_args()

if isinstance(args.workers, float):
args.workers = max(1, int(os.cpu_count() * args.workers))

setup_cors(app, args.allowed_origins)

import uvicorn
Expand Down
Loading