Skip to content

Commit

Permalink
feat: Set workers through env variable, improved defaults (#15)
Browse files Browse the repository at this point in the history
* feat: Set `workers` through env variable, improved defaults

* feat: Address review comments, Update README.md

* chore: Addressed review comments
  • Loading branch information
SiddhantSadangi authored Jul 15, 2024
1 parent 24e1945 commit 61b495e
Show file tree
Hide file tree
Showing 2 changed files with 75 additions and 21 deletions.
52 changes: 32 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,34 +39,46 @@
uvicorn fastmlx:app --reload --workers 0
```

### Running with Multiple Workers (Parallel Processing)
> [!WARNING]
> The `--reload` flag should not be used in production. It is only intended for development purposes.
For improved performance and parallel processing capabilities, you can specify the number of worker processes. This is particularly useful for handling multiple requests simultaneously.
### Running with Multiple Workers (Parallel Processing)

```bash
fastmlx --workers 4
```
or
For improved performance and parallel processing capabilities, you can specify either the absolute number of worker processes or the fraction of CPU cores to use. This is particularly useful for handling multiple requests simultaneously.

```bash
uvicorn fastmlx:app --workers 4
```
You can also set the `FASTMLX_NUM_WORKERS` environment variable to specify the number of workers or the fraction of CPU cores to use. `workers` defaults to 2 if not passed explicitly or set via the environment variable.

Replace `4` with the desired number of worker processes. The optimal number depends on your system's resources and the specific requirements of your application.
In order of precedence (highest to lowest), the number of workers is determined by the following:
- Explicitly passed as a command-line argument
- `--workers 4` will set the number of workers to 4
- `--workers 0.5` will set the number of workers to half the number of CPU cores available (minimum of 1)
- Set via the `FASTMLX_NUM_WORKERS` environment variable
- Default value of 2

Notes:
- The `--reload` flag is useful during development as it automatically reloads the server when code changes are detected. However, it should not be used in production.
- When using multiple workers, the `--reload` flag is not compatible and should be omitted.
- The number of workers should typically not exceed the number of CPU cores available on your machine for optimal performance.
To use all available CPU cores, set the value to **1.0**.

### Considerations for Multi-Worker Setup
Example:
```bash
fastmlx --workers 4
```
or

```bash
uvicorn fastmlx:app --workers 4
```

> [!NOTE]
> - `--reload` flag is not compatible with multiple workers
> - The number of workers should typically not exceed the number of CPU cores available on your machine for optimal performance.
### Considerations for Multi-Worker Setup

1. **Stateless Application**: Ensure your FastMLX application is stateless, as each worker process operates independently.
2. **Database Connections**: If your app uses a database, make sure your connection pooling is configured to handle multiple workers.
3. **Resource Usage**: Monitor your system's resource usage to find the optimal number of workers for your specific hardware and application needs. Additionally, you can remove any unused models using the delete model endpoint.
4. **Load Balancing**: When running with multiple workers, incoming requests are automatically load-balanced across the worker processes.
1. **Stateless Application**: Ensure your FastMLX application is stateless, as each worker process operates independently.
2. **Database Connections**: If your app uses a database, make sure your connection pooling is configured to handle multiple workers.
3. **Resource Usage**: Monitor your system's resource usage to find the optimal number of workers for your specific hardware and application needs. Additionally, you can remove any unused models using the delete model endpoint.
4. **Load Balancing**: When running with multiple workers, incoming requests are automatically load-balanced across the worker processes.

By leveraging multiple workers, you can significantly improve the throughput and responsiveness of your FastMLX application, especially under high load conditions.
By leveraging multiple workers, you can significantly improve the throughput and responsiveness of your FastMLX application, especially under high load conditions.

3. **Making API Calls**

Expand Down
44 changes: 43 additions & 1 deletion fastmlx/fastmlx.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,27 @@ class ChatCompletionResponse(BaseModel):
app = FastAPI()


# Custom type function
def int_or_float(value):
try:
return int(value)
except ValueError:
try:
return float(value)
except ValueError:
raise argparse.ArgumentTypeError(f"{value} is not an int or float")


def calculate_default_workers(workers: int = 2) -> int:
"""Calculate the default number of workers based on environment variable."""
if num_workers_env := os.getenv("FASTMLX_NUM_WORKERS"):
try:
workers = int(num_workers_env)
except ValueError:
workers = max(1, int(os.cpu_count() * float(num_workers_env)))
return workers


# Add CORS middleware
def setup_cors(app: FastAPI, allowed_origins: List[str]):
app.add_middleware(
Expand Down Expand Up @@ -274,9 +295,30 @@ def run():
default=False,
help="Enable auto-reload of the server. Only works when 'workers' is set to None.",
)
parser.add_argument("--workers", type=int, default=2, help="Number of workers")

parser.add_argument(
"--workers",
type=int_or_float,
default=calculate_default_workers,
help="""Number of workers. Overrides the `FASTMLX_NUM_WORKERS` env variable.
Can be either an int or a float.
If an int, it will be the number of workers to use.
If a float, number of workers will be this fraction of the number of CPU cores available, with a minimum of 1.
Defaults to the `FASTMLX_NUM_WORKERS` env variable if set and to 2 if not.
To use all available CPU cores, set it to 1.0.
Examples:
--workers 1 (will use 1 worker)
--workers 1.0 (will use all available CPU cores)
--workers 0.5 (will use half the number of CPU cores available)
--workers 0.0 (will use 1 worker)""",
)

args = parser.parse_args()

if isinstance(args.workers, float):
args.workers = max(1, int(os.cpu_count() * args.workers))

setup_cors(app, args.allowed_origins)

import uvicorn
Expand Down

0 comments on commit 61b495e

Please sign in to comment.