Skip to content

Commit

Permalink
#190 enabled to generate a dynamic module for custom udf (#202)
Browse files Browse the repository at this point in the history
* Added support to create dynamic modules
* Fixed handling of lowercase names for UDFs and database schemas in LUA 

Co-authored-by: Mikhail Beck <[email protected]>
Co-authored-by: Torsten Kilias <[email protected]>
  • Loading branch information
3 people authored Oct 25, 2024
1 parent e299f38 commit 489e238
Show file tree
Hide file tree
Showing 17 changed files with 338 additions and 155 deletions.
35 changes: 35 additions & 0 deletions .github/workflows/check-code-generation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: Check Code Generation

on:
push:
branches-ignore:
- main

jobs:
check_code_generation:
name: Lua Amalgate and Example in User Guide
strategy:
fail-fast: false
matrix:
python-version: [ "3.10" ]
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- name: Setup Python & Poetry Environment
uses: exasol/python-toolbox/.github/actions/[email protected]
with:
python-version: ${{ matrix.python-version }}

- name: Install Development Environment
run: poetry run nox -s install_dev_env

- name: Poetry install
run: poetry run -- nox -s run_in_dev_env -- poetry install

- name: Amalgate Lua Scripts
run: poetry run nox -s amalgate_lua_scripts

- name: Check if re-generated files differ from commit
run: git diff --exit-code
45 changes: 0 additions & 45 deletions .github/workflows/check-packaging.yml

This file was deleted.

1 change: 1 addition & 0 deletions doc/changes/changes_0.1.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ Code name:
* #176: Updated usage of `exasol-bucketfs` to new API
* #185: Removed directory and script for building SLC AAF
* #191: Renamed UDF json element "parameters" to "parameter"
* #190: Added dynamic module generation and used it in the example UDF in the user guide
* #178: Fixed names of mock objects:
* Renamed `testing.mock_query_handler_runner.MockQueryHandlerRunner` to `query_handler.python_query_handler_runner.PythonQueryHandlerRunner`
* Renamed method `PythonQueryHandlerRunner.execute_query()` to `execute_queries()`
Expand Down
10 changes: 10 additions & 0 deletions doc/developer_guide/developer_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,16 @@ poetry run nox -s build_language_container

Installing the SLC ins described in the [AAF User Guide](../user_guide/user_guide.md#script-language-container-slc).

## Update Generated Files

AAF contains the amalgated Lua script [create_query_loop.sql](https://github.com/exasol/advanced-analytics-framework/blob/main/exasol_advanced_analytics_framework/resources/outputs/create_query_loop.sql) originating from the files in the directory [exasol_advanced_analytics_framework/lua/src](https://github.com/exasol/advanced-analytics-framework/blob/main/exasol_advanced_analytics_framework/lua/src/).

The following command updates the amalgated script:

```shell
poetry run nox -s amalgate_lua_scripts
```

## Running Tests

AAF comes with different automated tests implemented in different programming languages and requiring different environments:
Expand Down
88 changes: 88 additions & 0 deletions doc/user_guide/example-udf-script/create.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
--/
CREATE OR REPLACE PYTHON3_AAF SET SCRIPT "EXAMPLE_SCHEMA"."EXAMPLE_QUERY_HANDLER_UDF"(...)
EMITS (outputs VARCHAR(2000000)) AS

from typing import Union
from exasol_advanced_analytics_framework.udf_framework.udf_query_handler import UDFQueryHandler
from exasol_advanced_analytics_framework.udf_framework.dynamic_modules import create_module
from exasol_advanced_analytics_framework.query_handler.context.query_handler_context import QueryHandlerContext
from exasol_advanced_analytics_framework.query_result.query_result import QueryResult
from exasol_advanced_analytics_framework.query_handler.result import Result, Continue, Finish
from exasol_advanced_analytics_framework.query_handler.query.select_query import SelectQuery, SelectQueryWithColumnDefinition
from exasol_advanced_analytics_framework.query_handler.context.proxy.bucketfs_location_proxy import \
BucketFSLocationProxy
from exasol_data_science_utils_python.schema.column import Column
from exasol_data_science_utils_python.schema.column_name import ColumnName
from exasol_data_science_utils_python.schema.column_type import ColumnType
from datetime import datetime
from exasol.bucketfs import as_string


example_module = create_module("example_module")

class ExampleQueryHandler(UDFQueryHandler):

def __init__(self, parameter: str, query_handler_context: QueryHandlerContext):
super().__init__(parameter, query_handler_context)
self.parameter = parameter
self.query_handler_context = query_handler_context
self.bfs_proxy = None
self.db_table_proxy = None

def _bfs_file(self, proxy: BucketFSLocationProxy):
return proxy.bucketfs_location() / "temp_file.txt"

def start(self) -> Union[Continue, Finish[str]]:
def sample_content(key: str) -> str:
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
return f"{timestamp} {key} {self.parameter}"

def table_query_string(statement: str, **kwargs):
table_name = self.db_table_proxy._db_object_name.fully_qualified
return statement.format(table_name=table_name, **kwargs)

def table_query(statement: str, **kwargs):
return SelectQuery(table_query_string(statement, **kwargs))

self.bfs_proxy = self.query_handler_context.get_temporary_bucketfs_location()
self._bfs_file(self.bfs_proxy).write(sample_content("bucketfs"))
self.db_table_proxy = self.query_handler_context.get_temporary_table_name()
query_list = [
table_query('CREATE TABLE {table_name} ("c1" VARCHAR(100), "c2" INTEGER)'),
table_query("INSERT INTO {table_name} VALUES ('{value}', 4)",
value=sample_content("table-insert")),
]
query_handler_return_query = SelectQueryWithColumnDefinition(
query_string=table_query_string('SELECT "c1", "c2" from {table_name}'),
output_columns=[
Column(ColumnName("c1"), ColumnType("VARCHAR(100)")),
Column(ColumnName("c2"), ColumnType("INTEGER")),
])
return Continue(
query_list=query_list,
input_query=query_handler_return_query)

def handle_query_result(self, query_result: QueryResult) -> Union[Continue, Finish[str]]:
c1 = query_result.c1
c2 = query_result.c2
bfs_content = as_string(self._bfs_file(self.bfs_proxy).read())
return Finish(result=f"Final result: from query '{c1}', {c2} and bucketfs: '{bfs_content}'")


example_module.add_to_module(ExampleQueryHandler)

class ExampleQueryHandlerFactory:
def create(self, parameter: str, query_handler_context: QueryHandlerContext):
return example_module.ExampleQueryHandler(parameter, query_handler_context)

example_module.add_to_module(ExampleQueryHandlerFactory)

from exasol_advanced_analytics_framework.udf_framework.query_handler_runner_udf \
import QueryHandlerRunnerUDF

udf = QueryHandlerRunnerUDF(exa)

def run(ctx):
return udf.run(ctx)

/
20 changes: 20 additions & 0 deletions doc/user_guide/example-udf-script/execute.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
EXECUTE SCRIPT "AAF_DB_SCHEMA"."AAF_RUN_QUERY_HANDLER"('{
"query_handler": {
"factory_class": {
"module": "example_module",
"name": "ExampleQueryHandlerFactory"
},
"parameter": "bla-bla",
"udf": {
"schema": "EXAMPLE_SCHEMA",
"name": "EXAMPLE_QUERY_HANDLER_UDF"
}
},
"temporary_output": {
"bucketfs_location": {
"connection_name": "EXAMPLE_BFS_CON",
"directory": "temp"
},
"schema_name": "EXAMPLE_TEMP_SCHEMA"
}
}')
13 changes: 13 additions & 0 deletions doc/user_guide/proxies.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
## AAF Proxies

The Advanced Analytics Framework (AAF) uses _Object Proxies_ to manage temporary objects.

An _Object Proxy_
* Encapsulates a temporary object
* Provides a reference enabling using the object, i.e. its name incl. the database schema or the path in the BucketFS
* Ensures the object is removed when leaving the current scope, e.g. the Query Handler.

All Object Proxies are derived from class `exasol_advanced_analytics_framework.query_handler.context.proxy.object_proxy.ObjectProxy`:
* `BucketFSLocationProxy` encapsulates a location in the BucketFS
* `DBObjectNameProxy` encapsulates a database object, e.g. a table

Loading

0 comments on commit 489e238

Please sign in to comment.