`ListSQLDatabaseTool`: Improve Agent Table Selection via Metadata Descriptions When Calling `sql_db_list_tables` #29662

Culpable · 2025-02-07T14:04:57Z

Culpable
Feb 7, 2025

Checked

I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it

Feature request

Description

There is a design limitation in the SQL database tools workflow that leads to incorrect table selection and query generation. As documented in the SQL QA Tutorial, the current workflow forces agents to make decisions with insufficient information:

Agent calls sql_db_list_tables which returns only raw table names
Agent must choose which table to query based solely on these names
Only after selecting a table can it use sql_db_schema to get detailed information

Without table descriptions, the agent often selects an incorrect table. For example, when a query mentions "playlists", the agent might choose the Playlist table even though the correct data resides in the junction table PlaylistTrack. This problematic sequence typically follows these steps:

Calls sql_db_list_tables and receives:
"Album,Artist,Customer,Employee,Genre,Invoice,InvoiceLine,MediaType,Playlist,PlaylistTrack,Track"
Sees the question about "playlists" and chooses the Playlist table
Only after querying the schema does it realise that the Playlist table only contains playlist names and metadata
It then has to start over using the PlaylistTrack table instead

Motivation

Expected Behaviour
The agent should receive basic table descriptions during the initial listing phase to make an informed decision about which table to query. For example, the desired output from sql_db_list_tables should be:

Playlist: Contains playlist names and metadata only
PlaylistTrack: Junction table mapping songs to the playlists they appear in
Track: Individual songs and their details

Actual Behaviour
sql_db_list_tables returns only raw table names without context:

"Album,Artist,Customer,Employee,Genre,Invoice,InvoiceLine,MediaType,Playlist,PlaylistTrack,Track"

This forces the agent to make uninformed choices before accessing detailed table information via sql_db_schema.

Proposal (If applicable)

Proposed Solution
Enhance SQLDatabase to accept table summaries separately from detailed schema information. This solution adds validation, improved type hints, and a fallback to extract summaries from detailed schema info. It separates brief summaries from detailed schema information, maintains backward compatibility, and reduces unnecessary API calls while improving accuracy.

from typing import Dict, Optional
from sqlalchemy.engine import Engine

class SQLDatabase:
    def __init__(
        self,
        engine: Engine,
        table_summaries: Optional[Dict[str, str]] = None,  # New parameter for brief descriptions
        custom_table_info: Optional[Dict[str, str]] = None,  # Detailed schema information
        **kwargs
    ):
        """
        Initialise SQLDatabase with both brief summaries and detailed info.
        
        :param engine: SQLAlchemy engine instance
        :param table_summaries: Brief descriptions for table listing
        :param custom_table_info: Detailed schema information
        :raises ValueError: If table_summaries contains invalid table names
        """
        self._engine = engine
        self._table_summaries = table_summaries or {}
        self._custom_table_info = custom_table_info or {}
        
        # ... rest of initialisation ...
        # Assume self._all_tables is set during initialisation with all valid table names.
        if hasattr(self, '_all_tables'):
            invalid_tables = set(self._table_summaries.keys()) - self._all_tables
            if invalid_tables:
                raise ValueError(
                    f"table_summaries contains invalid tables: {invalid_tables}"
                )

    def _extract_summary_from_custom_info(self, table: str) -> Optional[str]:
        """
        Extract the first line summary from custom table info.
        
        :param table: Name of the table
        :return: First line of custom info or None if not found
        """
        if table not in self._custom_table_info:
            return None
        info = self._custom_table_info[table]
        first_line = info.split('\n')[0].strip()
        return first_line if first_line else None

    def get_usable_table_names_with_summaries(self) -> str:
        """
        Get names of usable tables with their brief descriptions.
        
        Attempts to retrieve descriptions in the following order:
          1. From table_summaries if provided.
          2. From the first line of custom_table_info if available.
          3. Falls back to just the table name if no description is found.
        
        :return: Newline-separated list of "table: description" pairs.
        """
        tables = self.get_usable_table_names()
        descriptions = []
        for table in tables:
            summary = self._table_summaries.get(table)
            if not summary:
                summary = self._extract_summary_from_custom_info(table)
            if summary:
                descriptions.append(f"{table}: {summary}")
            else:
                descriptions.append(table)
        return "\n".join(descriptions)

# Usage Example

db = SQLDatabase.from_uri(
    "sqlite:///Chinook.db",
    include_tables=[
        'Album', 'Artist', 'Customer', 'Employee', 'Genre', 'Invoice', 
        'InvoiceLine', 'MediaType', 'Playlist', 'PlaylistTrack', 'Track'
    ],
    table_summaries={
        "Playlist": "Stores playlist names and metadata only",
        "PlaylistTrack": "Junction table mapping songs to playlists",
        "Track": "Individual songs and their details"
    },
    custom_table_info={
        "Playlist": """
Table: Playlist Records
Purpose: Stores playlist metadata including names and creation dates
Columns:
    - PlaylistId (PRIMARY KEY)
    - Name
    - CreatedDate
""",
        # ... other detailed schema information can be added here
    }
)

# Modification to ListSQLDatabaseTool

class ListSQLDatabaseTool(BaseSQLDatabaseTool, BaseTool):
    name: str = "sql_db_list_tables"
    description: str = "Input is an empty string, output is a list of tables with brief descriptions."
    
    def _run(
        self,
        tool_input: str = "",
        run_manager: Optional[CallbackManagerForToolRun] = None,
    ) -> str:
        """
        Get a list of tables with their summaries.
        
        :param tool_input: Ignored, maintained for compatibility.
        :param run_manager: Optional callback manager.
        :return: Newline-separated list of tables with descriptions.
        """
        return self.db.get_usable_table_names_with_summaries()

# Example Code from Original Scenario

from langchain_community.utilities import SQLDatabase
from langchain_community.agent_toolkits import SQLDatabaseToolkit
from langchain.chat_models import ChatOpenAI
from langgraph.prebuilt import create_react_agent

# Initialise database with Chinook example from the tutorial
db = SQLDatabase.from_uri(
    "sqlite:///Chinook.db",
    include_tables=[
        'Album', 'Artist', 'Customer', 'Employee', 'Genre', 'Invoice', 
        'InvoiceLine', 'MediaType', 'Playlist', 'PlaylistTrack', 'Track'
    ]
)

# Initialise agent components
llm = ChatOpenAI(temperature=0)
toolkit = SQLDatabaseToolkit(db=db, llm=llm)
tools = toolkit.get_tools()

# Create agent
agent = create_react_agent(llm, tools, system_message)

# Example question where table selection is critical
question = "How many playlists contain the song 'Yellow Submarine'?"

# This question requires PlaylistTrack (junction table mapping songs to playlists)
# However, the agent often incorrectly chooses Playlist first because:
# - Playlist sounds more relevant (it's in the question)
# - Without table descriptions, it doesn't know that:
#   * Playlist only contains playlist names and metadata
#   * PlaylistTrack is the junction table needed to answer this question

result = agent.invoke({
    "messages": [{"role": "user", "content": question}]
})

# The agent typically follows this problematic sequence:
# 1. Calls sql_db_list_tables
#    Returns: "Album,Artist,Customer,Employee,Genre,Invoice,InvoiceLine,MediaType,Playlist,PlaylistTrack,Track"
# 2. Sees question about "playlists" and chooses Playlist table
# 3. Only after querying schema realises Playlist table only has playlist names
# 4. Has to start over using PlaylistTrack table instead

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ListSQLDatabaseTool`: Improve Agent Table Selection via Metadata Descriptions When Calling `sql_db_list_tables` #29662

{{title}}

Replies: 0 comments

Select a reply

ListSQLDatabaseTool: Improve Agent Table Selection via Metadata Descriptions When Calling sql_db_list_tables #29662

Culpable Feb 7, 2025

Checked

Feature request

Description

Motivation

Proposal (If applicable)

Replies: 0 comments

`ListSQLDatabaseTool`: Improve Agent Table Selection via Metadata Descriptions When Calling `sql_db_list_tables` #29662

Culpable
Feb 7, 2025