Skip to content

Commit

Permalink
Add docs for Python UDFs
Browse files Browse the repository at this point in the history
  • Loading branch information
mosabua committed Dec 16, 2024
1 parent bcb9f6f commit 893fc42
Show file tree
Hide file tree
Showing 5 changed files with 273 additions and 10 deletions.
1 change: 1 addition & 0 deletions docs/src/main/sphinx/udf.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,5 @@ More details are available in the following sections:
udf/introduction
udf/function
udf/sql
udf/python
```
29 changes: 21 additions & 8 deletions docs/src/main/sphinx/udf/function.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ FUNCTION name ( [ parameter_name data_type [, ...] ] )
[ CALLED ON NULL INPUT ]
[ SECURITY { DEFINER | INVOKER } ]
[ COMMENT description]
statements
[ WITH ( property_name = expression [, ...] ) ]
{ statements | AS definition }
```

## Description
Expand All @@ -31,7 +32,9 @@ The `type` value after the `RETURNS` keyword identifies the [data
type](/language/types) of the UDF output.

The optional `LANGUAGE` characteristic identifies the language used for the UDF
definition with `language`. Only `SQL` is supported.
definition with `language`. The `SQL` and `PYTHON` languages are supported by
default. Additional languages may be supported via a language engine plugin.
If not specified, the default language is `SQL`.

The optional `DETERMINISTIC` or `NOT DETERMINISTIC` characteristic declares that
the UDF is deterministic. This means that repeated UDF calls with identical
Expand All @@ -58,10 +61,18 @@ The `COMMENT` characteristic can be used to provide information about the
function to other users as `description`. The information is accessible with
[](/sql/show-functions).

The body of the UDF can either be a simple single `RETURN` statement with an
expression, or compound list of `statements` in a `BEGIN` block. UDF must
contain a `RETURN` statement at the end of the top-level block, even if it's
unreachable.
The optional `WITH` clause can be used to specify properties for the function.
The available properties vary based on the function language. For
[](/udf/python), the `handler` property specifies the name of the Python
function to invoke.

For SQL UDFs the body of the UDF can either be a simple single `RETURN`
statement with an expression, or compound list of `statements` in a `BEGIN`
block. UDF must contain a `RETURN` statement at the end of the top-level block,
even if it's unreachable.

For UDFs in other languages, the `definition` is enclosed in a `$$`-quoted
string.

## Examples

Expand Down Expand Up @@ -89,12 +100,14 @@ SELECT meaning_of_life();
```

Further examples of varying complexity that cover usage of the `FUNCTION`
statement in combination with other statements are available in the [SQL
UDF examples documentation](/udf/sql/examples).
statement in combination with other statements are available in the [SQL UDF
documentation](/udf/sql/examples) and the [Python UDF
documentation](/udf/python).

## See also

* [](/udf)
* [](/udf/sql)
* [](/udf/python)
* [](/sql/create-function)

5 changes: 3 additions & 2 deletions docs/src/main/sphinx/udf/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@ A user-defined function (UDF) is a custom function authored by a user of Trino
in a client application. UDFs are scalar functions that return a single output
value, similar to [built-in functions](/functions).

UDFs are defined and written using the [SQL routine language](/udf/sql).

:::{note}
Custom functions can alternatively be written in Java and deployed as a
plugin. Details are available in the [developer guide](/develop/functions).
Expand All @@ -14,6 +12,9 @@ plugin. Details are available in the [developer guide](/develop/functions).
(udf-declaration)=
## UDF declaration

Declare the UDF with the SQL [](/udf/function) keyword and the supported
statements for [](/udf/sql) or [](/udf/python).

A UDF can be declared as an [inline UDF](udf-inline) to be used in the current
query, or declared as a [catalog UDF](udf-catalog) to be used in any future
query.
Expand Down
181 changes: 181 additions & 0 deletions docs/src/main/sphinx/udf/python.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
# Python user-defined functions

A Python user-defined function is a [user-defined function](/udf) that uses the
[Python programming language and statements](python-udf-lang) for the definition
of the function.

:::{warning}
Python user-defined functions are an experimental feature.
:::

## Python UDF declaration

Declare a Python UDF as [inline](udf-inline) or [catalog UDF](udf-catalog) with
the following steps:

* Use the [](/udf/function) keyword to declare the UDF name and parameters.
* Add the `RETURNS` declaration to specify the data type of the result.
* Set the `LANGUAGE` to `PYTHON`.
* Declare the name of the Python function to call with the `handler` property in
the `WITH` block.
* Use `$$` to enclose the Python code after the `AS` keyword.
* Add the function from the handler property and ensure it returns the declared
data type.
* Expand your Python code section to implement the function using the available
[Python language](python-udf-lang).

The following snippet shows pseudo-code:

```text
FUNCTION python_udf_name(input_parameter data_type)
RETURNS result_data_type
LANGUAGE PYTHON
WITH (handler = 'python_function')
AS $$
...
def python_function(input):
return ...
...
$$
```

A minimal example declares the UDF `doubleup` that returns the input integer
value `x` multiplied by two. The example shows declaration as [](udf-inline) and
invocation with the value `21` to yield the result `42`.

Set the language to `PYTHON` to override the default `SQL` for [](/udf/sql).
The Python code is enclosed with ``$$` and must use valid formatting.

```text
WITH
FUNCTION doubleup(x integer)
RETURNS integer
LANGUAGE PYTHON
WITH (handler = 'twice')
AS $$
def twice(a):
return a * 2
$$
SELECT doubleup(21);
-- 42
```

The same UDF can also be declared as [](udf-catalog).

Refer to the [](/udf/python/examples) for more complex use cases and examples.

```{toctree}
:titlesonly: true
:hidden:
/udf/python/examples
```

(python-udf-lang)=
## Python language details

The Trino Python UDF integrations uses Python 3.13.0 in a sandboxed environment.
Python code runs within a WebAssembly (WASM) runtime within the Java virtual
machine running Trino.

Python language rules including indents must be observed.

Python UDFs therefore only have access to the Python language and core libraries
included in the sandboxed runtime. Access to external resources with network or
file system operations is not supported. Usage of other Python libraries as well
as command line tools or package managers is not supported.

The following libraries are explicitly removed from the runtime and therefore
not available within a Python UDF:

* `bdb`
* `concurrent`
* `curses`
* `ensurepip`
* `doctest`
* `idlelib`
* `multiprocessing`
* `pdb`
* `pydoc`
* `socketserver*`
* `sqlite3`
* `ssl`
* `subprocess*`
* `tkinter`
* `turtle*`
* `unittest`
* `venv`
* `webbrowser*`
* `wsgiref`
* `xmlrpc`

## Type mapping

The following table shows supported Trino types and their corresponding Python
types for input and output values of a Python UDF:

:::{list-table} File system support properties
:widths: 50, 50
:header-rows: 1

* - Trino type
- Python type
* - row
- tuple
* - array
- list
* - map
- dict
* - boolean
- bool
* - tinyint
- int
* - smallint
- int
* - integer
- int
* - bigint
- int
* - real
- float
* - double
- float
* - decimal
- decimal.Decimal
* - varchar
- str
* - varbinary
- bytes
* - date
- datetime.date
* - time
- datetime.time
* - time with time zone
- datetime.time with datetime.tzinfo
* - timestamp
- datetime.datetime
* - timestamp with time zone
- datetime.datetime with datetime.tzinfo 1
* - interval year to month
- int as the number of months
* - interval day to second
- datetime.timedelta
* - json
- str
* - uuid
- uuid.UUID
* - ipaddress
- ipaddress.IPv4Address or ipaddress.IPv6Address

:::

### Date and time

Python datetime objects only support microsecond precision. Trino argument
values with greater precision arerounded when converted to Python values, and
Python return values are rounded if the Trino return type has less than
microsecond precision.

Only fixed offset time zones are supported. Timestamps with political time zones
have the zone converted to the zone's offset for the timestamp's instant.

67 changes: 67 additions & 0 deletions docs/src/main/sphinx/udf/python/examples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Example Python UDFs

After learning about [](/udf/python), the following sections show examples
of valid Python UDFs.

## XOR

The following example implements a `xor` function for a logical Exclusive OR
operation on two boolean input parameters and tests it with two invocations:

```text
WITH FUNCTION xor(a boolean, b boolean)
RETURNS boolean
LANGUAGE PYTHON
WITH (handler = 'bool_xor')
AS $$
import operator
def bool_xor(a, b):
return operator.xor(a, b)
$$
SELECT xor(true, false), xor(false, true);
```

Result of the query:

```
true | true
```

## reverse_words

The following example uses a more elaborate Python script to reverse the
characters in each word of the input string `s` of type `varchar` and tests the
function.

```text
WITH FUNCTION reverse_words(s varchar)
RETURNS varchar
LANGUAGE PYTHON
WITH (handler = 'reverse_words')
AS $$
import re
def reverse(s):
str = ""
for i in s:
str = i + str
return str
pattern = re.compile(r"\w+[.,'!?\"]\w*")
def process_word(word):
# Reverse only words without non-letter signs
return word if pattern.match(word) else reverse(word)
def reverse_words(payload):
text_words = payload.split(' ')
return ' '.join([process_word(w) for w in text_words])
$$
SELECT reverse_words('Civic, level, dna racecar era semordnilap');
```

Result of the query:

```
Civic, level, and racecar are palindromes
```

0 comments on commit 893fc42

Please sign in to comment.