Skip to content

Commit

Permalink
feat: Address review comments, Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
SiddhantSadangi committed Jul 12, 2024
1 parent f5e4c9a commit 8c3a7b5
Show file tree
Hide file tree
Showing 2 changed files with 66 additions and 24 deletions.
52 changes: 31 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,36 +39,46 @@
uvicorn fastmlx:app --reload --workers 0
```

### Running with Multiple Workers (Parallel Processing)
> [!WARNING]
> The `--reload` flag should not be used in production. It is only intended for development purposes.
For improved performance and parallel processing capabilities, you can specify the number of worker processes. This is particularly useful for handling multiple requests simultaneously.
### Running with Multiple Workers (Parallel Processing)

```bash
fastmlx --workers 4
```
or
For improved performance and parallel processing capabilities, you can specify either the absolute number of worker processes or the fraction of CPU cores to use. This is particularly useful for handling multiple requests simultaneously.

```bash
uvicorn fastmlx:app --workers 4
```
You can also set the `FASTMLX_NUM_WORKERS` environment variable to specify the number of workers or the fraction of CPU cores to use. `workers` defaults to 2 if not passed explicitly or set via the environment variable.

Replace `4` with the desired number of worker processes. The optimal number depends on your system's resources and the specific requirements of your application.
In order of precedence (highest to lowest), the number of workers is determined by the following:
- Explicitly passed as a command-line argument
- `--workers 4` will set the number of workers to 4
- `--workers 0.5` will set the number of workers to half the number of CPU cores available (minimum of 1)
- Set via the `FASTMLX_NUM_WORKERS` environment variable
- Default value of 2

You can also set the `FASTMLX_NUM_WORKERS` environment variable to specify the number of workers. `workers` defaults to 2 or the number of CPU cores minus 4, whichever is higher, if not passed explicitly or set via the environment variable.
To use all available CPU cores, set the value to **1.0**.

#### Notes:
- The `--reload` flag is useful during development as it automatically reloads the server when code changes are detected. However, it should not be used in production.
- When using multiple workers, the `--reload` flag is not compatible and should be omitted.
- The number of workers should typically not exceed the number of CPU cores available on your machine for optimal performance.
Example:
```bash
fastmlx --workers 4
```
or

```bash
uvicorn fastmlx:app --workers 4
```

> [!NOTE]
> - `--reload` flag is not compatible with multiple workers
> - The number of workers should typically not exceed the number of CPU cores available on your machine for optimal performance.
### Considerations for Multi-Worker Setup
### Considerations for Multi-Worker Setup

1. **Stateless Application**: Ensure your FastMLX application is stateless, as each worker process operates independently.
2. **Database Connections**: If your app uses a database, make sure your connection pooling is configured to handle multiple workers.
3. **Resource Usage**: Monitor your system's resource usage to find the optimal number of workers for your specific hardware and application needs. Additionally, you can remove any unused models using the delete model endpoint.
4. **Load Balancing**: When running with multiple workers, incoming requests are automatically load-balanced across the worker processes.
1. **Stateless Application**: Ensure your FastMLX application is stateless, as each worker process operates independently.
2. **Database Connections**: If your app uses a database, make sure your connection pooling is configured to handle multiple workers.
3. **Resource Usage**: Monitor your system's resource usage to find the optimal number of workers for your specific hardware and application needs. Additionally, you can remove any unused models using the delete model endpoint.
4. **Load Balancing**: When running with multiple workers, incoming requests are automatically load-balanced across the worker processes.

By leveraging multiple workers, you can significantly improve the throughput and responsiveness of your FastMLX application, especially under high load conditions.
By leveraging multiple workers, you can significantly improve the throughput and responsiveness of your FastMLX application, especially under high load conditions.

3. **Making API Calls**

Expand Down
38 changes: 35 additions & 3 deletions fastmlx/fastmlx.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,16 @@ class ChatCompletionResponse(BaseModel):
app = FastAPI()


# Custom type function
def int_or_float(value):
for type_ in (int, float):
try:
return type_(value)
except ValueError:
continue
raise argparse.ArgumentTypeError(f"{value} is not an int or float")


# Add CORS middleware
def setup_cors(app: FastAPI, allowed_origins: List[str]):
app.add_middleware(
Expand Down Expand Up @@ -274,15 +284,37 @@ def run():
default=False,
help="Enable auto-reload of the server. Only works when 'workers' is set to None.",
)

_default_workers = 2
if _num_workers_env := os.getenv("FASTMLX_NUM_WORKERS"):
try:
_default_workers = int(_num_workers_env)
except ValueError:
_default_workers = max(1, int(os.cpu_count() * float(_num_workers_env)))

parser.add_argument(
"--workers",
type=int,
default=os.getenv("FASTMLX_NUM_WORKERS", max(2, os.cpu_count() - 4)),
type=int_or_float,
default=_default_workers,
help="""Number of workers. Overrides the `FASTMLX_NUM_WORKERS` env variable.
Defaults to the `FASTMLX_NUM_WORKERS` env variable if set, or to 2 or the number of CPU cores available minus 4, whichever is higher.""",
Can be either an int or a float.
If an int, it will be the number of workers to use.
If a float, number of workers will be this fraction of the number of CPU cores available, with a minimum of 1.
Defaults to the `FASTMLX_NUM_WORKERS` env variable if set and to 2 if not.
To use all available CPU cores, set it to 1.0.
Examples:
--workers 1 (will use 1 worker)
--workers 1.0 (will use all available CPU cores)
--workers 0.5 (will use half the number of CPU cores available)
--workers 0.0 (will use 1 worker)""",
)

args = parser.parse_args()

if isinstance(args.workers, float):
args.workers = max(1, int(os.cpu_count() * args.workers))

setup_cors(app, args.allowed_origins)

import uvicorn
Expand Down

0 comments on commit 8c3a7b5

Please sign in to comment.