Releases: Dataherald/dataherald
v1.0.3
Release Notes for Version 1.0.3
What's New
1. New features
- Added Redshift support c8e55a2
- Added multi-schemas support for db connections, it only works for Postgres, Bigquery, Snowflake and Databricks d4d6f4e
2. Improvements and fixes
- Fixed
uri
validation for db connections f40ac0e - Fixed the fallback and confidence score 30f5226
- Fixed the observation code blocks a11d1f1
- Fixed refresh endpoint error handling 15b6d46
- Fixed the malformed sql queries in intermediate steps 828c64d
- If the sql-generation endpoint gets an invalid sql it should raise an error fbd96ea
New Contributors
v1.0.2
Release Notes for Version 1.0.2
What's New
1. New features
- Adds Astra vector store support 6f39892
- Adds MS SQL Server support 078c17d
- Adds Streaming endpoint to show intermediate steps 1205d8a
- Adds support to Pinecone serverless 7906f03
- Adds intermediate steps in the SQL Generation response 3dbd483
- Adds LangSmith metadata param(
langsmith_metadata
) to easily filter cf88a1b - Stores the db dialect when a db connection is created 809ac31
2. Improvements and fixes
- Adds logs when a request fails 09f65c6
- Adds descriptions to the new agent faf07de
- Fixes malformed LLM output 4190b4d
- Documents error codes e94c788
- Fixes the running query forever issue cfb1d5b
- Fixes the error parsing handler 8751410
- Added Click House Hyperloglog support to improve the scaninng 61a92c9
- Fixes SQL generation 5160e8d
- Fixes background scanner process in Parallel 88ee8fa
- Fixes error handling for golden SQL additions 8efb00f
3. Migration Script
- Purpose: To facilitate a smooth transition from version 1.0.1 to version 1.0.2, we've introduced a migration script.
- Data Modifications: The script performs the following actions:
- Decrypts all the db connection
uri
column - Executes a regex method to retrieve the db dialect.
- Stores
dialect
column indatabase_connections
mongo collection.
- Decrypts all the db connection
To run the migration script, use the following command:
docker-compose exec app python3 -m dataherald.scripts.populate_dialect_db_connection
New Contributors
v1.0.1
Release Notes for Version 1.0.1
What's New
1. New features
- Added clickhouse support d494fed
- MariaDB/MySQL support officially added and documented. 7b86ad3
- Added a refresh endpoint (
POST /api/v1/table-descriptions/refresh
) to get the table name from a specified database and store them into thetable-description
Mongo collection. This improves response time when querying thetable-description
list endpoint (GET /api/v1/table-descriptions
). 28b8130 - Implemented error codes for better error handling. Now errors response a 400 HTTP status code. 2c70f16
2. Changes and fixes
- Reduced SSH fields in requests by utilizing the
connection_uri
field. 64ceb6e - Updated LLM with the latest models. dd440f2
- Expanded functionality to allow SSH connections on different ports. 1a5a2be
- Improved performance for scanning endpoint (
POST /api/v1/table-descriptions/sync-schemas
). 435884e
3. Migration Script
- You don't need to update the data if you're already using the stable 1.0.0 version; you can simply pull these changes.
New Contributors
v01.0.0
Release Notes for Version 1.0.0
What's New
1. New Resources, Attributes, and Endpoints
- Finetuning: One of our new exciting features is automatic finetuning GPT family models on your golden questions/SQLs pairs.
POST /api/v1/finetuning
: By calling this endpoint you can create a fientuning job on your golden question/SQL pairs. The only required parameter is the db_connection_id and you have the option to specify which golden question/SQL pairs you want to use for finetuning process.GET /api/v1/finetuning/{finetuning_di}
: With this endpoint you can retrieve the status of the finetuning process and once the status is SUCCEEDED you can use the model for SQL generation.POST /api/v1/finetuning/{finetuning_id}/cancel
: If you want to cancel the finetuning for whatever reason you can call this endpoint.GET /api/v1/finetuning
: List all of the finetuned models for a given db_connection_idDELETE /api/v1/finetuning/{finetuning_di}
: Delete a given finetuned model from the finetunings collection.
- Metadata: All resources now include a
metadata
attribute, allowing you to store additional information for internal purposes. Soon, GET list endpoints will support filtering based on metadata fields.
2. Resource and Endpoint Changes
-
Renaming
questions
toprompts
: The entity has been renamed toPrompt
, and the collection is now calledprompts
. You can use the following endpoints to interact with this resource:GET /api/v1/prompts
: List all existingprompts
.POST /api/v1/prompts
: Create a newprompt
.GET /api/v1/prompts/{prompt_id}
: Retrieve a specificprompt
.PUT /api/v1/prompts/{prompt_id}
: Update the metadata for aprompt
.
-
Splitting
responses
intosql_generation
andnl_generation
: The previousresponses
resource has been divided intosql_generations
andnl_generations
. You can work with them as follows:-
POST /api/v1/prompts/{prompt_id}/sql-generations
: Create asql-generation
from an existingprompt
. -
POST /api/v1/prompts/sql-generations
: Create a newprompt
and asql-generation
. -
GET /api/v1/prompts/sql-generations
: Listsql-generations
. -
GET /api/v1/sql-generations/{sql_generation_id}
: Retrieve a specificsql-generation
. -
PUT /api/v1/sql-generations/{sql_generation_id}
: Update the metadata for asql-generation
. -
GET /api/v1/sql-generations/{sql_generation_id}/execute
: Execute the created SQL and retrieve the result. -
GET /api/v1/sql-generations/{sql_generation_id}/csv-file
: Execute the created SQL and generate a CSV file using the result. -
POST /api/v1/sql-generations/{sql_generation_id}/nl-generations
: Create annl-generation
from an existingsql-generation
. -
POST /api/v1/prompts/{prompt_id}/sql-generations/nl-generations
: Create asql-generation
and annl-generation
from an existingprompt
. -
POST /api/v1/prompts/sql-generations/nl-generations
: Create aprompt
,sql-generation
, andnl-generation
. -
GET /api/v1/nl-generations
: List allnl-generations
. -
GET /api/v1/nl-generations/{nl_generation_id}
: Retrieve a specificnl-generation
. -
GET /api/v1/sql-generations/{sql_generation_id}
: Retrieve a specificsql-generation
. -
PUT /api/v1/nl-generations/{nl_generation_id}
: Update the metadata for annl-generation
.
-
-
Renaming
golden_records
togolden_sqls
: We've updated the name for all endpoints, entities, and collections.
3. Migration Script
- Purpose: To facilitate a smooth transition from version 0.0.5 to version 1.0.0, we've introduced a migration script.
- Data Modifications: The script performs the following actions:
- Renames the
golden_records
collection togolden_sqls
. - Replaces all related data types from
ObjectId
to strings. - Updates table descriptions by changing "SYNCHRONIZED" status to "SCANNED" and "NOT_SYNCHRONIZED" to "NOT_SCANNED."
- Utilizes the existing
questions
collections to create theprompts
collection. - Converts
responses
collections intosql_generations
andnl_generations
collections.
- Renames the
To run the migration script, use the following command:
docker-compose exec app python3 -m dataherald.scripts.migrate_v006_to_v100
We hope that these changes enhance your experience with our platform. If you have any questions or encounter any issues, please don't hesitate to reach out to our support team.
v0.0.6
What's Changed
1. Changes in POST /api/v1/responses
endpoint:
If the sql_query
body parameter is not set, the response is regenerated. This process generates new values for sql_query
, sql_result
, and response
.
2. Introducing the generate_csv
flag:
The generate_csv
flag is a parameter that allows the generation of a CSV file populated with the sql_query_result
rows. This parameter can be set in both POST /api/v1/responses
and POST /api/v1/questions
endpoints.
-
If the file is created, the response will include the field
csv_file_path
. For example:"csv_file_path": "s3://k2-core/c6ddccfc-f355-4477-a2e7-e43f77e31bbb.csv"
-
Additionally, if the
generate_csv
flag is set toTrue
, thesql_query_result
will returnNULL
when it contains more than 50 rows.
3. Configure S3 Credentials:
- You have the flexibility to set your S3 credentials to store the CSV files within the
POST /api/v1/database-connections
endpoint as follows:
"file_storage": {
"name": "string",
"access_key_id": "string",
"secret_access_key": "string",
"region": "string",
"bucket": "string"
}
- If S3 credentials are not specified within the
db_connection
, the system will use the S3 credentials from your environment variables, as set in your.env
file.
These changes will improve the consistency and maintainability of your application's data structures and APIs. If you encounter any issues during the upgrade process, please don't hesitate to reach out to our support team.
New Contributors
v0.0.5
What's Changed
1. Endpoint Update
-
Affected Endpoints: The changes impact two API endpoints:
--POST /api/v1/database-connections
: This endpoint is used to create a database connection.
--PUT /api/v1/database-connections/{db_connection_id}
: This endpoint is used to update a database connection. -
Change Description: The
llm_credentials object
in these endpoints has been replaced with thellm_api_key field
, which now only accepts strings as its value. In other words, thellm_credentials
field has been removed, and it has been replaced with a simplerllm_api_key
field that can only hold string values. This change suggests a more straightforward approach to managing API keys or credentials within the system.
5. Migration Script
-
Purpose: A migration script has been introduced to assist users in smoothly transitioning their data from version 0.0.4 to version 0.0.5.
-
Data Modification: This script operates on the data_connections collection and performs the following action:
It replaces the llm_credentials field with the llm_api_key field but only if the llm_credentials field is populated in the data. In other words, if there is data in the llm_credentials field, the script will transfer it to the new llm_api_key field.
To run the migration script, use the following command:
docker-compose exec app python3 -m dataherald.scripts.migrate_v004_to_v005
These changes will improve the consistency and maintainability of your application's data structures and APIs. If you encounter any issues during the upgrade process, please don't hesitate to reach out to our support team.
New Contributors
v0.0.4
What's Changed f57fde5
1. Endpoint Renaming
We have streamlined our API endpoints for better consistency and clarity:
Renamed Endpoints:
- POST /api/v1/nl-query-responses is now POST /api/v1/responses.
- POST /api/v1/question is now POST /api/v1/questions.
2. Endpoint Removal
In this version, we have removed the following endpoint:
- PATCH /api/v1/nl-query-responses/{query_id}.
Note: Responses resources are now immutable, so you can only create new responses and not update existing ones.
3. MongoDB Collection and Field Renaming
To improve consistency and readability, we have renamed MongoDB collection and field names:
Collection Name Changes:
- nl_questions collection has been renamed to questions.
- nl_query_responses collection has been renamed to responses.
Field Name Changes (within the responses collection):
- nl_question_id has been renamed to question_id.
- nl_response has been renamed to response.
4. Use of ObjectId for Foreign Keys
To enhance data integrity and relationships, we have transitioned to using ObjectId types for foreign keys, providing stronger data typing.
5. Migration Script
We've created a migration script to help you smoothly transition your data from version 0.0.3 to version 0.0.4. This script updates collection names, field names, and foreign keys data type to ObjectId. To run the migration script, use the following command:
docker-compose exec app python3 -m dataherald.scripts.migrate_v003_to_v004
Upgrade Instructions:
To upgrade to Version 0.0.4, follow these steps:
- Ensure you have Docker Compose installed.
- Pull the latest version of the application.
- Run the provided migration script as shown above.
These changes will improve the consistency and maintainability of your application's data structures and APIs. If you encounter any issues during the upgrade process, please don't hesitate to reach out to our support team.
New Contributors
v0.0.3
What's Changed
1. Validate Database Connection Requests 5937b35
- When a database connection is created or updated, it now attempts to establish a connection.
- If the connection is successfully established, it is stored, and a
200
response is returned. - In case of failure, a
400
error response is generated.
2. Add LLM Credentials to Database Connection Endpoints 2d9e873
- With the latest update, when creating or updating a database connection, you have the option to set LLM credentials. This allows you to use different keys for different connections
3. SSH Connection Update a66f7d8
- We have discontinued the use of the
private_key_path
field for SSH connections. - Instead, we now utilize the
path_to_credentials_file
to specify the path to the SSH private key file.
4. Enhanced Table Scanning with Background Tasks fdc3bb7
- We have implemented background tasks for asynchronous table scanning.
- The endpoint name has been updated from
/api/v1/table-descriptions/scan
to/api/v1/table-descriptions/sync-schemas
. - This enhancement ensures that even if the process operates slowly, potentially taking several minutes, the HTTP response remains consistently fast and responsive.
5. Returns Scanned Tables and Not Scanned Tables 9e2d119
- This endpoint
/api/v1/table-descriptions
should make a db connection to retrieve all the table names and check which tables have been scanned to generate a response. - The status can be:
NOT_SYNCHRONIZED
if the table has not been scannedSYNCHRONIZING
while the sync schema process is runningDEPRECATED
if there is a row in ourtable-descriptions
collection that is no longer in the database, probably because the table/view was deleted or renamedSYNCHRONIZED
when we have scanned the tableFAILED
if anything failed during the sync schema process, and theerror_message
field stores the error.
6. Migration Script from v0.0.2 to v0.0.3 9e2d119
- This script facilitates the transition from version v0.0.2 to v0.0.3 by performing the following essential task:
In the table_descriptions collection, it updates the status field to the value SYNCHRONIZED.
To execute the script, simply run the following command:
docker-compose exec app python3 -m dataherald.scripts.migrate_v002_to_v003
New Contributors
v0.0.2
What's Changed
1. RESTful Endpoint Names and Swagger Grouping
We have made significant changes to our endpoint naming conventions, following RESTful principles. Additionally, we have organized the endpoints into logical sections within our Swagger documentation for easier navigation and understanding.
2. MongoDB Collection Name Changes
We have updated the names of several MongoDB collections. Here are the collection name changes:
- nl_query_response ➡️ nl_query_responses
- nl_question ➡️ nl_questions
- database_connection ➡️ database_connections
- table_schema_detail ➡️ table_descriptions
3. Migration to db_connection_id for MongoDB Collections
Previously, we used a db_alias field to relate MongoDB collections. In this release, we have transitioned to using a new field called db_connection_id to establish relationships between collections.
4. Renamed Core Methods for Code Clarity
To improve the clarity of our codebase, we have renamed several core methods.
5. Migration Script from v0.0.1 to v0.0.2
We understand the importance of a smooth transition between versions. This script performs the following actions:
- Adds the db_connection_id relation for all MongoDB collections.
- Renames all MongoDB collection names to align with the new naming conventions.
- Deletes the Vector store data (Pinecone or Chroma) and utilizes the golden_records collection to upload the data seamlessly.
To execute the script just run this command
docker-compose exec app python3 -m dataherald.scripts.migrate_v001_to_v002