-
Notifications
You must be signed in to change notification settings - Fork 3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
273 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,4 +13,5 @@ More details are available in the following sections: | |
udf/introduction | ||
udf/function | ||
udf/sql | ||
udf/python | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,181 @@ | ||
# Python user-defined functions | ||
|
||
A Python user-defined function is a [user-defined function](/udf) that uses the | ||
[Python programming language and statements](python-udf-lang) for the definition | ||
of the function. | ||
|
||
:::{warning} | ||
Python user-defined functions are an experimental feature. | ||
::: | ||
|
||
## Python UDF declaration | ||
|
||
Declare a Python UDF as [inline](udf-inline) or [catalog UDF](udf-catalog) with | ||
the following steps: | ||
|
||
* Use the [](/udf/function) keyword to declare the UDF name and parameters. | ||
* Add the `RETURNS` declaration to specify the data type of the result. | ||
* Set the `LANGUAGE` to `PYTHON`. | ||
* Declare the name of the Python function to call with the `handler` property in | ||
the `WITH` block. | ||
* Use `$$` to enclose the Python code after the `AS` keyword. | ||
* Add the function from the handler property and ensure it returns the declared | ||
data type. | ||
* Expand your Python code section to implement the function using the available | ||
[Python language](python-udf-lang). | ||
|
||
The following snippet shows pseudo-code: | ||
|
||
```text | ||
FUNCTION python_udf_name(input_parameter data_type) | ||
RETURNS result_data_type | ||
LANGUAGE PYTHON | ||
WITH (handler = 'python_function') | ||
AS $$ | ||
... | ||
def python_function(input): | ||
return ... | ||
... | ||
$$ | ||
``` | ||
|
||
A minimal example declares the UDF `doubleup` that returns the input integer | ||
value `x` multiplied by two. The example shows declaration as [](udf-inline) and | ||
invocation with the value `21` to yield the result `42`. | ||
|
||
Set the language to `PYTHON` to override the default `SQL` for [](/udf/sql). | ||
The Python code is enclosed with ``$$` and must use valid formatting. | ||
|
||
```text | ||
WITH | ||
FUNCTION doubleup(x integer) | ||
RETURNS integer | ||
LANGUAGE PYTHON | ||
WITH (handler = 'twice') | ||
AS $$ | ||
def twice(a): | ||
return a * 2 | ||
$$ | ||
SELECT doubleup(21); | ||
-- 42 | ||
``` | ||
|
||
The same UDF can also be declared as [](udf-catalog). | ||
|
||
Refer to the [](/udf/python/examples) for more complex use cases and examples. | ||
|
||
```{toctree} | ||
:titlesonly: true | ||
:hidden: | ||
/udf/python/examples | ||
``` | ||
|
||
(python-udf-lang)= | ||
## Python language details | ||
|
||
The Trino Python UDF integrations uses Python 3.13.0 in a sandboxed environment. | ||
Python code runs within a WebAssembly (WASM) runtime within the Java virtual | ||
machine running Trino. | ||
|
||
Python language rules including indents must be observed. | ||
|
||
Python UDFs therefore only have access to the Python language and core libraries | ||
included in the sandboxed runtime. Access to external resources with network or | ||
file system operations is not supported. Usage of other Python libraries as well | ||
as command line tools or package managers is not supported. | ||
|
||
The following libraries are explicitly removed from the runtime and therefore | ||
not available within a Python UDF: | ||
|
||
* `bdb` | ||
* `concurrent` | ||
* `curses` | ||
* `ensurepip` | ||
* `doctest` | ||
* `idlelib` | ||
* `multiprocessing` | ||
* `pdb` | ||
* `pydoc` | ||
* `socketserver*` | ||
* `sqlite3` | ||
* `ssl` | ||
* `subprocess*` | ||
* `tkinter` | ||
* `turtle*` | ||
* `unittest` | ||
* `venv` | ||
* `webbrowser*` | ||
* `wsgiref` | ||
* `xmlrpc` | ||
|
||
## Type mapping | ||
|
||
The following table shows supported Trino types and their corresponding Python | ||
types for input and output values of a Python UDF: | ||
|
||
:::{list-table} File system support properties | ||
:widths: 50, 50 | ||
:header-rows: 1 | ||
|
||
* - Trino type | ||
- Python type | ||
* - row | ||
- tuple | ||
* - array | ||
- list | ||
* - map | ||
- dict | ||
* - boolean | ||
- bool | ||
* - tinyint | ||
- int | ||
* - smallint | ||
- int | ||
* - integer | ||
- int | ||
* - bigint | ||
- int | ||
* - real | ||
- float | ||
* - double | ||
- float | ||
* - decimal | ||
- decimal.Decimal | ||
* - varchar | ||
- str | ||
* - varbinary | ||
- bytes | ||
* - date | ||
- datetime.date | ||
* - time | ||
- datetime.time | ||
* - time with time zone | ||
- datetime.time with datetime.tzinfo | ||
* - timestamp | ||
- datetime.datetime | ||
* - timestamp with time zone | ||
- datetime.datetime with datetime.tzinfo 1 | ||
* - interval year to month | ||
- int as the number of months | ||
* - interval day to second | ||
- datetime.timedelta | ||
* - json | ||
- str | ||
* - uuid | ||
- uuid.UUID | ||
* - ipaddress | ||
- ipaddress.IPv4Address or ipaddress.IPv6Address | ||
|
||
::: | ||
|
||
### Date and time | ||
|
||
Python datetime objects only support microsecond precision. Trino argument | ||
values with greater precision arerounded when converted to Python values, and | ||
Python return values are rounded if the Trino return type has less than | ||
microsecond precision. | ||
|
||
Only fixed offset time zones are supported. Timestamps with political time zones | ||
have the zone converted to the zone's offset for the timestamp's instant. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
# Example Python UDFs | ||
|
||
After learning about [](/udf/python), the following sections show examples | ||
of valid Python UDFs. | ||
|
||
## XOR | ||
|
||
The following example implements a `xor` function for a logical Exclusive OR | ||
operation on two boolean input parameters and tests it with two invocations: | ||
|
||
```text | ||
WITH FUNCTION xor(a boolean, b boolean) | ||
RETURNS boolean | ||
LANGUAGE PYTHON | ||
WITH (handler = 'bool_xor') | ||
AS $$ | ||
import operator | ||
def bool_xor(a, b): | ||
return operator.xor(a, b) | ||
$$ | ||
SELECT xor(true, false), xor(false, true); | ||
``` | ||
|
||
Result of the query: | ||
|
||
``` | ||
true | true | ||
``` | ||
|
||
## reverse_words | ||
|
||
The following example uses a more elaborate Python script to reverse the | ||
characters in each word of the input string `s` of type `varchar` and tests the | ||
function. | ||
|
||
```text | ||
WITH FUNCTION reverse_words(s varchar) | ||
RETURNS varchar | ||
LANGUAGE PYTHON | ||
WITH (handler = 'reverse_words') | ||
AS $$ | ||
import re | ||
def reverse(s): | ||
str = "" | ||
for i in s: | ||
str = i + str | ||
return str | ||
pattern = re.compile(r"\w+[.,'!?\"]\w*") | ||
def process_word(word): | ||
# Reverse only words without non-letter signs | ||
return word if pattern.match(word) else reverse(word) | ||
def reverse_words(payload): | ||
text_words = payload.split(' ') | ||
return ' '.join([process_word(w) for w in text_words]) | ||
$$ | ||
SELECT reverse_words('Civic, level, dna racecar era semordnilap'); | ||
``` | ||
|
||
Result of the query: | ||
|
||
``` | ||
Civic, level, and racecar are palindromes | ||
``` |