Skip to content

Latest commit

 

History

History
306 lines (245 loc) · 8.98 KB

api_endpoint.md

File metadata and controls

306 lines (245 loc) · 8.98 KB
myst
html_meta
title description keywords
AutoRAG - Deploy API endpoint
Learn how to deploy optimized RAG pipeline to Flask API server in AutoRAG
AutoRAG,RAG,RAG deploy,RAG API,Flask

API endpoint

Running API server

As mentioned in the tutorial, you can run api server as follows:

from autorag.deploy import ApiRunner
import nest_asyncio

nest_asyncio.apply()

runner = ApiRunner.from_yaml('your/path/to/pipeline.yaml', project_dir='your/project/directory')
runner.run_api_server()

or

from autorag.deploy import ApiRunner
import nest_asyncio

nest_asyncio.apply()

runner = ApiRunner.from_trial_folder('/your/path/to/trial_dir')
runner.run_api_server()
autorag run_api --trial_dir /trial/dir/0 --host 0.0.0.0 --port 8000

Use NGrok Tunnel for public access

For accessing the API server from the public, you can use the NGrok tunnel service. It automatically creates ngrok tunnel to your local server.

You can see the logs of the public URL like below:

INFO     [api.py:199] >> Public API URL:          api.py:199
         https://8a31-14-52-132-205.ngrok-free.app

This is the URL to your local server, so use it as the host at request.

Endpoints

1. /v1/run (POST)

  • Summary: Run a query and get generated text with retrieved passages.
  • Request Body:
    • Content Type: application/json
    • Schema:
      • Properties:
        • query (string, required): The query string.
        • result_column (string, optional): The result column name. Default is generated_texts.
  • Responses:
    • 200 OK:
      • Content Type: application/json
      • Schema:
        • Properties:
          • result (string or array of strings): The result text or list of texts.
          • retrieved_passage (array of objects): List of retrieved passages.
            • Properties:
              • content (string): The content of the passage.
              • doc_id (string): Document ID.
              • score (string): The relevance score of the retrieved passage.
              • filepath (string, nullable): File path.
              • file_page (integer, nullable): File page number.
              • start_idx (integer, nullable): Start index.
              • end_idx (integer, nullable): End index.

2. /v1/retrieve (POST)

This API endpoint allows developers to retrieve documents based on a specified query. It will ignore generator and prompt maker, only return retrieved passages.

The request must include a JSON object with the following structure:

{
  "query": "your query string here"
}

Parameters

  • query (string, required): The search string used to retrieve documents.

Example Request

{
  "query": "latest trends in AI"
}

Success Response

HTTP Status Code: 200 OK

Response Body

Properties

  • passages (array): An array of documents retrieved based on the query.
    • doc_id (string): The unique identifier for each document.
    • content (string): The content of the retrieved document.
    • score (number, float): The relevance score of the retrieved document.
    • filepath (string, optional): The file path of the document.
    • file_page (integer, optional): The page number of the document.
    • start_idx (integer, optional): The start index of the retrieved passage from the parsed data.
    • end_idx (integer, optional): The end index of the retrieved passage from the parsed data.

Example Response

{
  "passages": [
    {
      "doc_id": "doc123",
      "content": "Artificial Intelligence is transforming industries.",
      "score": 0.98,
      "filepath": "path/to/file",
      "file_page": 2,
      "start_idx": 100,
      "end_idx": 150
    },
    {
      "doc_id": "doc456",
      "content": "The future of AI includes advancements in machine learning.",
      "score": 0.92,
      "filepath": null,
      "file_page": null,
      "start_idx": null,
      "end_idx": null
    }
  ]
}

3. /v1/stream (POST)

  • Summary: Stream generated text with retrieved passages.
  • Description: This endpoint streams the generated text line by line. The retrieved_passage is sent first, followed by the result streamed incrementally.
  • Request Body:
    • Content Type: application/json
    • Schema:
      • Properties:
        • query (string, required): The query string.
        • result_column (string, optional): The result column name. Default is generated_texts.
  • Responses:
    • 200 OK:
      • Content Type: text/event-stream
      • Schema:
        • Properties:
          • type (generated_text or retrieved_passage): If it is generated_text, you can see only the generated text. If it is retrieved_passage, you can see the retrieved passage and passage_index.
          • generated_text (string): The generated text from the generator (LLM). The result of the RAG system.
          • retrieved_passage (object): Retrieved passage.
            • Properties:
              • content (string): The content of the passage.
              • doc_id (string): Document ID.
              • score (string): The relevance score of the retrieved passage.
              • filepath (string, nullable): File path.
              • file_page (integer, nullable): File page number.
              • start_idx (integer, nullable): Start index.
              • end_idx (integer, nullable): End index.
          • passage_index (integer): Index of the retrieved passage.

4. /version (GET)

  • Summary: Get the API version.
  • Description: Returns the current version of the API as a string.
  • Responses:
    • 200 OK:
      • Content Type: application/json
      • Schema:
        • Properties:
          • version (string): The version of the API.

API client usage example

Certainly! Below, I'll provide both Python sample code using the requests library and a curl command for each of the API endpoints described in the OpenAPI specification.

Python Sample Code

First, ensure you have the requests library installed. You can install it using pip if you haven't already:

pip install requests

Here's the Python client code for each endpoint:

import requests
from autorag.utils.util import decode_multiple_json_from_bytes

# Base URL of the API
BASE_URL = "http://example.com:8000"  # Replace with the actual base URL of the API

def run_query(query, result_column="generated_texts"):
    url = f"{BASE_URL}/v1/run"
    payload = {
        "query": query,
        "result_column": result_column
    }
    response = requests.post(url, json=payload)
    if response.status_code == 200:
        return response.json()
    else:
        response.raise_for_status()

def stream_query(query, result_column="generated_texts"):
    url = f"{BASE_URL}/v1/stream"
    payload = {
        "query": query,
        "result_column": result_column
    }
    with requests.Session() as session:
      response = session.post(url, json=payload, stream=True)
      retrieved_passages = [] # This will store retrieved passages

      # Check if the request was successful
      if response.status_code == 200:
          # Process the streaming response
          for i, chunk in enumerate(response.iter_content(chunk_size=None)):
              if chunk:
                  data_list = decode_multiple_json_from_bytes(chunk)
                  for data in data_list:
                      if data["type"] == "retrieved_passage":
                          retrieved_passages.append(data["retrieved_passage"])
                      else:
                          print(data["generated_text"], end="") # Stream the generated texts
      else:
          print(f"Request failed with status code: {response.status_code}")
          print(f"Response content: {response.text}")

def get_version():
    url = f"{BASE_URL}/version"
    response = requests.get(url)
    if response.status_code == 200:
        return response.json()
    else:
        response.raise_for_status()

# Example usage
if __name__ == "__main__":
    # Run a query
    result = run_query("example query")
    print("Run Query Result:", result)

    # Stream a query
    print("Stream Query Result:")
    stream_query("example query")

    # Get API version
    version = get_version()
    print("API Version:", version)

curl Commands

Here are the equivalent curl commands for each endpoint:

/v1/run (POST)

curl -X POST "http://example.com/v1/run" \
     -H "Content-Type: application/json" \
     -d '{"query": "example query", "result_column": "generated_texts"}'

/v1/stream (POST)

curl -X POST "http://example.com/v1/stream" \
     -H "Content-Type: application/json" \
     -d '{"query": "example query", "result_column": "generated_texts"}' \
     --no-buffer

/v1/retrieve (POST)

curl -X POST "http://example.com/v1/retrieve" \
     -H "Content-Type: application/json" \
     -d '{"query": "example query"}'

/version (GET)

curl -X GET "http://example.com/version"