You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it
Feature request
Description
There is a design limitation in the SQL database tools workflow that leads to incorrect table selection and query generation. As documented in the SQL QA Tutorial, the current workflow forces agents to make decisions with insufficient information:
Agent calls sql_db_list_tables which returns only raw table names
Agent must choose which table to query based solely on these names
Only after selecting a table can it use sql_db_schema to get detailed information
Without table descriptions, the agent often selects an incorrect table. For example, when a query mentions "playlists", the agent might choose the Playlist table even though the correct data resides in the junction table PlaylistTrack. This problematic sequence typically follows these steps:
Calls sql_db_list_tables and receives: "Album,Artist,Customer,Employee,Genre,Invoice,InvoiceLine,MediaType,Playlist,PlaylistTrack,Track"
Sees the question about "playlists" and chooses the Playlist table
Only after querying the schema does it realise that the Playlist table only contains playlist names and metadata
It then has to start over using the PlaylistTrack table instead
Motivation
Expected Behaviour
The agent should receive basic table descriptions during the initial listing phase to make an informed decision about which table to query. For example, the desired output from sql_db_list_tables should be:
This forces the agent to make uninformed choices before accessing detailed table information via sql_db_schema.
Proposal (If applicable)
Proposed Solution
Enhance SQLDatabase to accept table summaries separately from detailed schema information. This solution adds validation, improved type hints, and a fallback to extract summaries from detailed schema info. It separates brief summaries from detailed schema information, maintains backward compatibility, and reduces unnecessary API calls while improving accuracy.
fromtypingimportDict, Optionalfromsqlalchemy.engineimportEngineclassSQLDatabase:
def__init__(
self,
engine: Engine,
table_summaries: Optional[Dict[str, str]] =None, # New parameter for brief descriptionscustom_table_info: Optional[Dict[str, str]] =None, # Detailed schema information**kwargs
):
""" Initialise SQLDatabase with both brief summaries and detailed info. :param engine: SQLAlchemy engine instance :param table_summaries: Brief descriptions for table listing :param custom_table_info: Detailed schema information :raises ValueError: If table_summaries contains invalid table names """self._engine=engineself._table_summaries=table_summariesor {}
self._custom_table_info=custom_table_infoor {}
# ... rest of initialisation ...# Assume self._all_tables is set during initialisation with all valid table names.ifhasattr(self, '_all_tables'):
invalid_tables=set(self._table_summaries.keys()) -self._all_tablesifinvalid_tables:
raiseValueError(
f"table_summaries contains invalid tables: {invalid_tables}"
)
def_extract_summary_from_custom_info(self, table: str) ->Optional[str]:
""" Extract the first line summary from custom table info. :param table: Name of the table :return: First line of custom info or None if not found """iftablenotinself._custom_table_info:
returnNoneinfo=self._custom_table_info[table]
first_line=info.split('\n')[0].strip()
returnfirst_lineiffirst_lineelseNonedefget_usable_table_names_with_summaries(self) ->str:
""" Get names of usable tables with their brief descriptions. Attempts to retrieve descriptions in the following order: 1. From table_summaries if provided. 2. From the first line of custom_table_info if available. 3. Falls back to just the table name if no description is found. :return: Newline-separated list of "table: description" pairs. """tables=self.get_usable_table_names()
descriptions= []
fortableintables:
summary=self._table_summaries.get(table)
ifnotsummary:
summary=self._extract_summary_from_custom_info(table)
ifsummary:
descriptions.append(f"{table}: {summary}")
else:
descriptions.append(table)
return"\n".join(descriptions)
# Usage Exampledb=SQLDatabase.from_uri(
"sqlite:///Chinook.db",
include_tables=[
'Album', 'Artist', 'Customer', 'Employee', 'Genre', 'Invoice',
'InvoiceLine', 'MediaType', 'Playlist', 'PlaylistTrack', 'Track'
],
table_summaries={
"Playlist": "Stores playlist names and metadata only",
"PlaylistTrack": "Junction table mapping songs to playlists",
"Track": "Individual songs and their details"
},
custom_table_info={
"Playlist": """Table: Playlist RecordsPurpose: Stores playlist metadata including names and creation datesColumns: - PlaylistId (PRIMARY KEY) - Name - CreatedDate""",
# ... other detailed schema information can be added here
}
)
# Modification to ListSQLDatabaseToolclassListSQLDatabaseTool(BaseSQLDatabaseTool, BaseTool):
name: str="sql_db_list_tables"description: str="Input is an empty string, output is a list of tables with brief descriptions."def_run(
self,
tool_input: str="",
run_manager: Optional[CallbackManagerForToolRun] =None,
) ->str:
""" Get a list of tables with their summaries. :param tool_input: Ignored, maintained for compatibility. :param run_manager: Optional callback manager. :return: Newline-separated list of tables with descriptions. """returnself.db.get_usable_table_names_with_summaries()
# Example Code from Original Scenariofromlangchain_community.utilitiesimportSQLDatabasefromlangchain_community.agent_toolkitsimportSQLDatabaseToolkitfromlangchain.chat_modelsimportChatOpenAIfromlanggraph.prebuiltimportcreate_react_agent# Initialise database with Chinook example from the tutorialdb=SQLDatabase.from_uri(
"sqlite:///Chinook.db",
include_tables=[
'Album', 'Artist', 'Customer', 'Employee', 'Genre', 'Invoice',
'InvoiceLine', 'MediaType', 'Playlist', 'PlaylistTrack', 'Track'
]
)
# Initialise agent componentsllm=ChatOpenAI(temperature=0)
toolkit=SQLDatabaseToolkit(db=db, llm=llm)
tools=toolkit.get_tools()
# Create agentagent=create_react_agent(llm, tools, system_message)
# Example question where table selection is criticalquestion="How many playlists contain the song 'Yellow Submarine'?"# This question requires PlaylistTrack (junction table mapping songs to playlists)# However, the agent often incorrectly chooses Playlist first because:# - Playlist sounds more relevant (it's in the question)# - Without table descriptions, it doesn't know that:# * Playlist only contains playlist names and metadata# * PlaylistTrack is the junction table needed to answer this questionresult=agent.invoke({
"messages": [{"role": "user", "content": question}]
})
# The agent typically follows this problematic sequence:# 1. Calls sql_db_list_tables# Returns: "Album,Artist,Customer,Employee,Genre,Invoice,InvoiceLine,MediaType,Playlist,PlaylistTrack,Track"# 2. Sees question about "playlists" and chooses Playlist table# 3. Only after querying schema realises Playlist table only has playlist names# 4. Has to start over using PlaylistTrack table instead
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Checked
Feature request
Description
There is a design limitation in the SQL database tools workflow that leads to incorrect table selection and query generation. As documented in the SQL QA Tutorial, the current workflow forces agents to make decisions with insufficient information:
sql_db_list_tables
which returns only raw table namessql_db_schema
to get detailed informationWithout table descriptions, the agent often selects an incorrect table. For example, when a query mentions "playlists", the agent might choose the
Playlist
table even though the correct data resides in the junction tablePlaylistTrack
. This problematic sequence typically follows these steps:sql_db_list_tables
and receives:"Album,Artist,Customer,Employee,Genre,Invoice,InvoiceLine,MediaType,Playlist,PlaylistTrack,Track"
Playlist
tablePlaylist
table only contains playlist names and metadataPlaylistTrack
table insteadMotivation
Expected Behaviour
The agent should receive basic table descriptions during the initial listing phase to make an informed decision about which table to query. For example, the desired output from
sql_db_list_tables
should be:Actual Behaviour
sql_db_list_tables
returns only raw table names without context:"Album,Artist,Customer,Employee,Genre,Invoice,InvoiceLine,MediaType,Playlist,PlaylistTrack,Track"
This forces the agent to make uninformed choices before accessing detailed table information via
sql_db_schema
.Proposal (If applicable)
Proposed Solution
Enhance
SQLDatabase
to accept table summaries separately from detailed schema information. This solution adds validation, improved type hints, and a fallback to extract summaries from detailed schema info. It separates brief summaries from detailed schema information, maintains backward compatibility, and reduces unnecessary API calls while improving accuracy.Beta Was this translation helpful? Give feedback.
All reactions