Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#992: added design doc for script options #477

Merged
merged 26 commits into from
Jan 20, 2025
Merged
Changes from 5 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
d2c2f6e
#991: Added Design Doc and Requirements Doc for Script Options parser
tomuben Oct 24, 2024
9385004
Updated design document
tomuben Oct 24, 2024
6c4c049
Updated design document
tomuben Nov 14, 2024
b9a7904
Updated design document
tomuben Nov 15, 2024
2b1e472
Updated design document
tomuben Nov 15, 2024
9671883
Updated design document & requirements document
tomuben Nov 15, 2024
93dca60
Added a GH workflow to run
tomuben Nov 15, 2024
9e35421
Fixed oft.yaml
tomuben Nov 15, 2024
3911606
Fixed oft.yaml
tomuben Nov 15, 2024
d0fee61
Fixed oft.yaml
tomuben Nov 15, 2024
a2fcff1
Fixed oft.yaml
tomuben Nov 15, 2024
7910608
Fixed oft.yaml
tomuben Nov 15, 2024
24caddc
Fixed oft.yaml
tomuben Nov 15, 2024
14425fa
Removed `Needs: req` in script_options_design.md
tomuben Nov 15, 2024
6800239
Fixed findings from OFT
tomuben Nov 15, 2024
1f08f8d
Fixed findings from OFT
tomuben Nov 15, 2024
207e27f
Fixed findings from review
tomuben Nov 22, 2024
af0d8ee
Merge branch 'master' into doc/992_add_design_doc_for_script_options
tomuben Dec 4, 2024
e80f66f
Merge remote-tracking branch 'origin/master' into doc/992_add_design_…
tomuben Jan 8, 2025
10072ee
Apply suggestions from code review
tomuben Jan 8, 2025
72dab60
Renamed design decision `Java %scriptclass Option Handling in Design`…
tomuben Jan 8, 2025
73184e1
Merge remote-tracking branch 'origin/doc/992_add_design_doc_for_scrip…
tomuben Jan 8, 2025
c1061af
Findings from review
tomuben Jan 8, 2025
edfc01d
Fixes from review
tomuben Jan 20, 2025
3976379
Merge remote-tracking branch 'origin/master' into doc/992_add_design_…
tomuben Jan 20, 2025
0eabe59
Update exaudfclient/docs/script_options_design.md
tomuben Jan 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 37 additions & 20 deletions exaudfclient/docs/script_options_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ This document's section structure is derived from the "[arc42](https://arc42.org
## Constraints

- The parser implementation must be in C++.
- The chosen parser implementation is [ctpg](https://github.com/peter-winter/ctpg), which supports the definition of Lexer and Parser Rules in C++ code.
- The selected parser should allow easy encapsulation in a custom C++ namespace (UDF client linker namespace constraint)
- The selected parser should not depend on additional runtime dependencies
- The selected parser should have minimal compile time dependencies, i.e. no additional shared libraries or tools to generate C++ code
Expand All @@ -26,15 +25,15 @@ Please refer to the [System Requirement Specification](script_options_requirment

![Components](diagrams/OveralScriptOptionalsBuildingBlocks.drawio.png)

At the very high level there can be distinguished between the generic "Script Options Parser" module which parses a UDF script code and returns the found script options, and the "Script Options Parser Handler" which converts the Java UDF specific script options. In both modules there are specific implementation for the legacy parser and the new CTPG based parser.
At the very high level there can be distinguished between the generic "Script Options Parser" module which parses a UDF script code and returns the found script options, and the "Script Options Parser Handler" which converts the Java UDF specific script options. In both modules there are specific implementation for the legacy parser and the new CTPG based parser. We need to keep the legacy implementation alive, as the new approach causes some breaking changes, especially related to the new escape patterns: Existing UDF might not be working with the new parser implementation.

### Script Options Parser

The parser component can be used to parse any script code (Java, Python, R) for any script options. It provides simplistic interfaces, which are different between the two versions, which accept the script code as input and return the found script option(s).
The parser component can be used to parse any script code (Java, Python, R) for any script options. It provides simplistic interfaces, which are different between the two versions, which accept the script code as input and return the found script option(s). The interfaces need to be different because both parsers work inherently differently: While the legacy parser successively finds and removes script options by the given key, the new parser finds **all** script options at once, but does not remove them.

#### Legacy Parser

The legacy parser (V1) parser searches for one specific script option.
The legacy parser (V1) parser searches for one specific script option, removes the whole option from the script code, and returns the script option value.

#### V2 Parser

Expand All @@ -43,7 +42,7 @@ It is important to use a parser generator implementation which allows the defini

As the parser needs to find script options in any given script code, the generated parser must accept any strings which are not script options and ignore those. In order to achieve this, the lexer rules need to be as simple as possible, in order to avoid collisions.

It is important to emphasize that in contrast to the legacy parser, the caller is responsible for removing the script options from the script code.
It is important to emphasize that in contrast to the legacy parser, the caller is responsible for removing the script options from the script code, as the parser is agnostic to the actual script options keys.
The interface provides a method which accepts the script code as input and returns a map with all found script options in the whole code. Each key in the map points to a list of all option values plus the start and end position of the option for this specific option key.

### Parser Handler
Expand All @@ -70,7 +69,7 @@ The following sequence diagram shows how the Java VM implementation uses the Par

![LegacyParserHandler](diagrams/LegacyParserHandler.drawio.png)

The `ScriptOptionsLinesParserLegacy` class uses the Parser to search for Java specific script options and forwards the found options to class `ConverterLegacy`, which uses a common implementation for the conversion of the options.
The `ScriptOptionsLinesParserLegacy` class uses the Parser to search for Java specific script options and forwards the found options to the class `ConverterLegacy`, which uses a common implementation for the conversion of the options.
Class `tLegacyExtractor` connects `ScriptOptionsLinesParserLegacy` to `ConverterLegacy` and then orchestrates the parsing sequence.

`ScriptOptionsLinesParserLegacy` also implements the import of foreign scripts. The import script algorithm iteratively replaces foreign scripts. The algorithm is described in the following pseudocode snippet:
Expand All @@ -90,18 +89,18 @@ while True:

![CTPGParserHandler](diagrams/CTPGParserHandler.drawio.png)

The `ScriptOptionsLinesParserCTPG` class uses the new CTPG basedParser to search for **all** Java specific script options at once. Then it forwards the found options to class `ConverterV2`, which uses a common implementation for the conversion of the options. `ConverterV2` also implements the functions to convert Jvm otions and JAR options.
The `ScriptOptionsLinesParserCTPG` class uses the new CTPG based Parser to search for **all** Java specific script options at once. Then it forwards the found options to class `ConverterV2`, which uses a common implementation for the conversion of the options. `ConverterV2` also implements the functions to convert Jvm otions and JAR options.
Class `tExtractorV2` connects `ScriptOptionsLinesParserCTPG` to `ConverterV2` and then orchestrates the parsing sequence.

##### CTPG based Script Import Algorithm
`ScriptOptionsLinesParserCTPG` uses an instance of `ScriptImporter` to import foreign scripts. Because the new parser collects all script options at once, but backwards compatibility with existing UDF scripts must be ensured, there is an additional level of complexity in the import script algorithm. The algorithm is described in the following pseudocode snippet:
```
function import(script_code, options_map)
import_option = options_map.find("import")
import_options = options_map.find("import")
if found:
sorted_import_option = sort(import_option) //import options according to their location in the script, increasing order
sorted_import_options = sort(import_options) //import options according to their location in the script, increasing order
collectedScripts = list() //list of (script_code, location, size)
for each import_option in sorted_import_option:
for each import_option in sorted_import_options:
import_script_code = resolve_foreign_script_somehow(import_option.value)
if not md5_hashset.has(import_script_code):
md5_hashset.add(import_script_code)
Expand Down Expand Up @@ -151,6 +150,24 @@ class OtherClassC {
}
```

tomuben marked this conversation as resolved.
Show resolved Hide resolved
The result must be:
```
class OtherClassB {
static void doSomething() {}
}
class OtherClassA {
static void doSomething() {}
}
class OtherClassC {
static void doSomething() {}
}
class JVMOPTION_TEST {
static void run(ExaMetadata exa, ExaIterator ctx) throws Exception {
ctx.emit(\"Success!\");
}
}
```

The following diagram shows how the scripts are collected in the recursive algorithm:

![V2ImportScriptFlow](diagrams/V2ImportScriptFlow.drawio.png)
Expand All @@ -161,7 +178,7 @@ The following diagram shows how the scripts are collected in the recursive algor

### Parser Implementation V1

The legacy parser (V1) parser searches for one specific script option. The parser starts from beginning of the script code. If found, the parser immediately removes the script option from the script code and returns the option value. It validates the
The legacy parser (V1) parser searches for one specific script option. The parser starts from the beginning of the script code. If found, the parser immediately removes the script option from the script code and returns the option value. It validates the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It validates the
Incomplete sentence

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed


### Parser Implementation V2

Expand Down Expand Up @@ -189,7 +206,7 @@ Tags: V2
`dsn~lexer-parser-rules~1`

Lexer and Parser rules to recognize `%optionKey`, `optionValue`, with whitespace characters as separator. The Parser rules will define the grammar to correctly identify Script Options, manage multiple options with the same key, and handle duplicates.
tkilias marked this conversation as resolved.
Show resolved Hide resolved

The regular expression for the lexer term for finding newline escape sequences or the semicolon escape sequence is `\\;|\\n|\\r|\\\\`. The regular expression for the lexer term for finding white space escape sequences is `\\ |\\t|\\f|\\v`.

Covers:
- `req~general-script-options-parsing~1`
Expand Down Expand Up @@ -232,7 +249,7 @@ Tags: V2
### Ignore lines without script options
`dsn~ignore-lines-without-script-options~1`

In order to avoid lower performance compared to the old implementation, the parser must run only on lines which contain a `%` character.
In order to avoid lower performance compared to the old implementation, the parser must run only on lines which contain a `%` character after only whitespaces.


Covers:
Expand Down Expand Up @@ -289,7 +306,7 @@ Define the Lexer rules to tokenize whitespace escape sequences:
- '\f' => <form feed> character
- '\v' => <vertical tab> character

Implement rules which replace those white space escape tokens only at the beginning of an option value. Add parser rules to ignore those tokens in anything else, which is not a script option. Those token are not expected to be part of an option key.
Implement rules which replace those white space escape tokens only at the beginning of an option value. Add parser rules to leave those tokens as is in anything else, which is not a script option. Those token are not expected to be part of an option key.


Covers:
Expand All @@ -302,8 +319,8 @@ Tags: V2
### Lexer and Parser Rules Escape Sequences
`dsn~lexer-parser-rules-escape-sequences~1`

Define the Lexer rules to tokenize '\n', '\r', '\; sequences, and parser rules to replace those sequences with <line feed>, <carriage return> or ';' characters.
Implement rules which replace those token at any location in an option value. Add parser rules to ignore those tokens in anything else, which is not a script option. Those token are not expected to be part of an option key.
Define the Lexer rules to tokenize '\n', '\r', '\;' sequences, and parser rules to replace those sequences with <line feed>, <carriage return> or ';' characters.
Implement rules which replace those token at any location in an option value. Add parser rules to leave those tokens as is in anything else, which is not a script option. Those token are not expected to be part of an option key.


Covers:
Expand All @@ -316,7 +333,7 @@ Tags: V2
### Script Option Removal Mechanism
`dsn~script-option-removal~1`

Implement a method in class `ScriptOptionLinesParserCTPG` which removes *all* identified Script Options from the original script code at once. This method be executed after the import scripts are replaced. The algorithm must replace the script options in reverse order in order to maintain consistency of internal list of positions.
Implement a method in class `ScriptOptionLinesParserCTPG` which removes *all* identified Script Options from the original script code at once. This method should be executed after the import scripts are replaced. The algorithm must replace the script options in reverse order in order to maintain consistency of internal list of positions.


Covers:
Expand All @@ -335,10 +352,10 @@ Covers:

Tags: V2

### Java %scriptclass Option Handling in Design
### Java %scriptclass Option Handling
`dsn~java-scriptclass-option-handling~1`

Implement a function in the `Converter` class which adds the script class option to the JVM Options list.
Implement a function in the `ConverterV2` class which adds the script class option to the JVM Options list.


Covers:
Expand Down Expand Up @@ -443,7 +460,7 @@ Tags: V2
### General Parser Integration
tomuben marked this conversation as resolved.
Show resolved Hide resolved
`dsn~general-parser-integration~1`

Ensure that the new parser integrates seamlessly into the Exasol UDF Client environment. This includes embedding the parser within the custom C++ namespace, ensuring it meets all linker requirements, and does not introduce additional runtime dependencies.
Ensure that the new parser integrates seamlessly into the Exasol UDF Client environment and the Exasol DB. This includes embedding the parser within the custom C++ namespace, ensuring it meets all linker requirements, and does not introduce additional runtime dependencies.


Covers:
Expand Down
Loading