FastAPI and Ollama Integration Demo

This project demonstrates how to integrate FastAPI with Ollama, a tool for running and managing AI models. It showcases three main functionalities:

Streaming Responses: Receive and display raw streaming responses from the Ollama API.
Formatted Responses: Aggregate and format streaming responses into a cohesive output.
Complete JSON Responses: Handle and display complete JSON responses from the Ollama API.

Dependencies

Required Software

Python: Ensure you have Python 3.7 or later installed on your system. You can download Python from the official Python website.
FastAPI: A modern, fast (high-performance) web framework for building APIs with Python.
Requests: A simple HTTP library for Python.
Ollama: A tool for running AI models locally.

Installation

FastAPI and Requests: You can install FastAPI and Requests using pip: pip install fastapi requests
Ollama: Follow the instructions on the Ollama GitHub repository to install Ollama. Make sure to download and install the version that includes the llama3.1 model.

For a quick installation via the command line, use: pip install ollama Ensure that you have the llama3.1 model available. You can usually download and install it through Ollama’s CLI or the web interface.

Files

app.py: Defines a FastAPI application with endpoints for generating raw and formatted responses from the Ollama API.
send_request.py: A command-line script to send requests to the FastAPI server and print responses. It supports both raw and formatted responses.
demo_script.py: Demonstrates how to use the the send_request function to retrieve streaming, formatted, and complete JSON responses.

Usage

Clone the Repository:

git clone https://github.com/darcyg32/fastapi-ollama-demo.git
cd fastapi-ollama-demo

Set Up a Virtual Environment:

python -m venv venv
source venv/bin/activate

Install Dependencies:

pip install -r requirements.txt

Running the FastAPI Server

Start the FastAPI server: uvicorn app:app --reload The server will be available at http://localhost:8000.

Sending Requests

Using the Command-Line Script: You can use send_request.py to interact with the FastAPI server. Here’s how to use it: python send_request.py <model> <prompt> [stream] [formatted]
- <model>: The name of the model to use (e.g., llama3.1).
- <prompt>: The prompt to send to the model.
- [stream]: Optional flag to enable streaming (default is False).
- [formatted]: Optional flag to get a formatted response (default is False).
Example: python send_request.py llama3.1 "Write a haiku." True True
Using the Demo Script: Run demo_script.py to see the demo in action: python demo_script.py This script will show examples of streaming, formatted, and complete JSON responses.

Using cURL:

Get Raw Streaming Response Example:

curl -X POST "http://localhost:8000/generate" -H "Content-Type: application/json" -d '{
  "model": "llama3.1",
  "prompt": "Write a haiku.",
  "stream": true
}'

Get Formatted Response Example:

curl -X POST "http://localhost:8000/generate_formatted" -H "Content-Type: application/json" -d '{
  "model": "llama3.1",
  "prompt": "Write a haiku.",
  "stream": false
}'

Get Complete JSON Response Example:

curl -X POST "http://localhost:8000/generate" -H "Content-Type: application/json" -d '{
  "model": "llama3.1",
  "prompt": "Write a haiku.",
  "stream": false
}'

Additional Notes

Ensure that Ollama is properly configured and running locally on http://localhost:11434. Update the URL in app.py if your Ollama instance is hosted elsewhere.
The FastAPI server and Ollama must be running simultaneously to process requests successfully.
For more details on FastAPI and Requests, refer to their respective documentation:
- FastAPI Documentation
- Requests Documentation

System Specifications

For reference, this project was developed and tested on the following hardware:

Processor: AMD Ryzen 5 5600X 6-Core
GPU: NVIDIA GeForce RTX 3060 Ti
RAM: 32 GB
Operating System: Ubuntu/WSL on Windows 11
Storage: 2 TB SSD
These specifications were sufficient for running the FastAPI server and Ollama integration demo. If you encounter any performance issues or have different specifications, you may need to adjust your setup accordingly.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastAPI and Ollama Integration Demo

Dependencies

Required Software

Installation

Files

Usage

Clone the Repository:

Set Up a Virtual Environment:

Install Dependencies:

Running the FastAPI Server

Sending Requests

Additional Notes

System Specifications

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
demo_script.py		demo_script.py
requirements.txt		requirements.txt
send_request.py		send_request.py

License

darcyg32/Ollama-FastAPI-Integration-Demo

Folders and files

Latest commit

History

Repository files navigation

FastAPI and Ollama Integration Demo

Dependencies

Required Software

Installation

Files

Usage

Clone the Repository:

Set Up a Virtual Environment:

Install Dependencies:

Running the FastAPI Server

Sending Requests

Additional Notes

System Specifications

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages