Stored procedures & triggers in Khepri #16

dumbbell · 2021-11-29T18:45:30Z

dumbbell
Nov 29, 2021
Maintainer

Here is an example: in RabbitMQ, when a runtime parameter is deleted, the rabbit_event module is called to emit a notification. This can be executed inside a Mnesia transaction. Mnesia doesn't care about side effects, thus it is perfectly allowed. That notification could be emitted many times if Mnesia needs to run the transaction again if the previous runs didn't succeed.

Khepri doesn't allow side effects in its transactions, even sending a message. This is by design because we want transactions to be deterministic and always give the same result. Sending a message is not possible because it would mean that the transaction knows about a process PID and this PID might make no sense at the time a new Ra cluster member joins and applies commands again.

Therefore, having this "a RabbitMQ runtime parameter was deleted" notification being emitted as part of a Khepri transaction is denied. However, it should be possible to rely on the "aux effect" of Ra to perform non-deterministic actions after the command is applied.

It would be nice to be able to run arbitrary code after an event happens.

Events

Here are some types of events which would be of interest:

A Khepri node (in the database tree) was added, modified or deleted.
A Ra cluster member joined or left the cluster.
A Ra cluster member became unreachable or reachable again.
A new Ra leader was elected.
A given Erlang process exited.

For Khepri nodes, the state machine could emit an event as part of an "aux effect". This event would contain the initial command (put or delete), the affected node(s), the return value and even the entire state.

Other types of events could be emitted by a dedicated process in Khepri who would monitor cluster membership and arbitrary Erlang nodes/PIDs.

Stored procedures

Like with transactions, if an event needs to trigger and execute some code, that code must be available on all cluster members.

It is possible to reuse the anonymous function extraction feature of the transactions to extract and isolate an anonymous function. The extracted function could be stored inside the database itself under any path the user sees fit (e.g. /stored-procs/when-node-is-deleted). This could be a new type of payload (in addition to "data", the only one supported today). It would allow the user to update or remove stored procedures using the regular Khepri API.

Link events to stored procedures

Now, we need the actual trigger. The link between events and a stored procedures could be stored inside the state machine's state, like keep_while conditions.

The dedicated process I mentionned earlier could thus listen to events, look for triggers in the state machine's state and run the associated code if any.

The state machine's state would come from the "aux effect" or the state machine could be queried for other types of events (membership/PIDs monitoring).

The stored procedure function would be able to do anything it wants. There would be no restrictions, unlike what we have for transactions. It would also be able to read the state of the database as it was at the time the event was emitted (regardless of the changes which happened in between).

If the stored procedure wants to modify the database, it would use the regular Khepri API as any caller.

Example

The caller stores an anonymous function:

khepri_machine:put(
  StoreId,
  [stored_procs, log_event],
  ?FUN_PAYLOAD(fun(Event) ->
                   logger:debug("Event = ~p", [Event])
               end).

The caller registers a trigger:

khepri_machine:register_trigger(
  StoreId,
  my_trigger_id,             %% To be able to remove it.
  [stored_procs, log_event],
  EventFiltering,            %% To only call it for matching events.
  Priority).                 %% To sort triggers listening to the same event.

When a node is deleted, khepri_machine:apply/3 returns the following "aux effect":

{aux, {event, deleted_node, [path, to, deleted, node], ReturnValue, MachineState}}

The monitoring/trigger dedicated process:
- looks up the trigger in the state machine's state
- applies the event filtering
- looks up the stored procedure in the state machine's state
- runs the stored procedure
In the case of the "a process has terminated" event, the dedicated process should probably remove the trigger automatically.

Subscribe to events

As an added bonus, we can implement event subscriptions on top of that: an arbitrary process could subscribe to the same events and be notified by the monitoring/trigger dedicated process when something happens.

This time, the subscriptions would probably not be stored inside the database, but inside that dedicated process' state. This is a simpler lightweight feature. For a stored subscription, the caller can just use the reguler trigger feature described above.

Feedback?

What do you think? Does it make sense? Will it break the guaranties of Ra(ft)?

mkuratczyk · 2021-11-30T09:04:15Z

mkuratczyk
Nov 30, 2021
Maintainer

This sounds pretty cool. Some questions:

Is there a way for the stored procedure to fail and perhaps be re-triggered or at least logged as not executed successfully?
Can I access logger from within the procedure (or log something in some other way)? Given the potential use-cases, an important piece of logic could be executed in the stored procedure and therefore, there could be a need to log the effects.
Is there a requirement for the priorities to be unique values to explicitly define the order of execution?
Databases usually allow "before" and "after" triggers. Could "before" triggers be implemented one day as well? This could potentially allow enforcing some policies that would be hard to express through configuration. For example, "user X cannot declare a queue of type Y with a replication factor of 1".

1 reply

dumbbell Nov 30, 2021
Maintainer Author

Is there a way for the stored procedure to fail and perhaps be re-triggered or at least logged as not executed successfully?

Executing a failed procedure again can be dangerous: we would need to handle poisonous triggers ;-) I didn't think of the implementation details yet, but I didn't plan to have clever behavior around failures, except logging the stacktrace for instance (a stacktrace which could make little sense because it would show the initial location of the anonymous function).

Can I access logger from within the procedure (or log something in some other way)? Given the potential use-cases, an important piece of logic could be executed in the stored procedure and therefore, there could be a need to log the effects.

Yes, I don't see why logging would not work. It would be like any anonymous functions executed by a permanent or temporary process.

Is there a requirement for the priorities to be unique values to explicitly define the order of execution?

Probably no. For the same priority, the order could be based on the order they were registered.

Databases usually allow "before" and "after" triggers. Could "before" triggers be implemented one day as well? This could potentially allow enforcing some policies that would be hard to express through configuration. For example, "user X cannot declare a queue of type Y with a replication factor of 1".

I think that becomes too much responsibility for triggers and stored procedures. It would be possible to implement that before calling Khepri or as part of the normal transaction function.

"After" triggers are useful mostly when a Khepri database node is removed as a consequence of another change.

dumbbell · 2021-11-30T09:28:58Z

dumbbell
Nov 30, 2021
Maintainer Author

After some discussion with @kjnilsson, the feature doesn't need to use the "aux effect". A simple "mod call effect" would achieve the same.

He also reminded me of an important fact: there would be no "execute once" guaranty at all. Here is an example he gave me:

A leader could execute the "mod call effect" triggering the execution of a stored procedure.
A new leader is elected but it didn't apply all the commands yet, compared to the old leader.
The new leader applies the commands and emit the same "mod call effect" a second time.

This must be documented because it could take users by surprise.

That said, this is the same with Mnesia: a transaction function can be restarted many times if needed to resolve conflicts. Therefore Khepri and Mnesia would behave the same w.r.t. side effects.

0 replies

lhoguin · 2021-11-30T10:05:34Z

lhoguin
Nov 30, 2021
Maintainer

I do not understand what is meant by (membership/PIDs monitoring) and you also mention this earlier:

Sending a message is not possible because it would mean that the transaction knows about a process PID and this PID might make no sense at the time a new Ra cluster member joins and applies commands again.

You don't have to know a pid to send a message. I would say that messages that should be sent inside or as a result of a transaction should only be sent to processes that can be looked up via a registry (register, pg, or facilities inside RabbitMQ). There should not be "live data" involved.

1 reply

dumbbell Nov 30, 2021
Maintainer Author

I do not understand what is meant by "(membership/PIDs monitoring)"

The possibility to trigger a stored procedure after a process exited or a member was added to/removed from the Ra cluster for instance.

You don't have to know a pid to send a message.

That's true but we want to avoid side effects in the transaction itself to keep the semantic of Ra state machines. Those side effects would be "delayed" after the transaction is finished. Also the transaction function is executed on all Ra cluster members, which would not be the case with those stored procedures.

dumbbell · 2021-12-08T17:54:29Z

dumbbell
Dec 8, 2021
Maintainer Author

I have a prototype for the stored procedure part:

khepri:start(),
%% -> {ok,khepri}

khepri:insert([a], fun() -> io:format("Youpi~n") end),
%% -> ok

khepri:run_sproc([a], []),
%% -> ok
%% Displays "Youpi" on stdout

The extracted anonymous function (i.e. the binary of the generated & compiled module) is stored in the database:

1> khepri:info(khepri).

== CLUSTER MEMBERS ==

nonode@nohost

== TREE ==

●
╰── a
      Data: {standalone_fun,'ktx__erl_eval__-expr/5-fun-3-__54305672',
                            <<70,79,82,49,0,0,1,160,66,69,65,77,65,116,85,56,0,0,0,
                              58,0,0,0,4,39,107,116,120,95,95,101,114,108,95,101,118,
                              97,108,95,95,45,101,120,112,114,47,53,45,102,117,110,
                              45,51,45,95,95,53,52,51,48,53,54,55,50,3,114,117,110,2,
                              105,111,6,102,111,114,109,97,116,0,0,67,111,100,101,0,
                              0,0,36,0,0,0,16,0,0,0,0,0,0,0,169,0,0,0,3,0,0,0,1,1,16,
                              2,18,34,0,1,32,64,71,0,3,78,16,0,3,83,116,114,84,0,0,0,
                              0,73,109,112,84,0,0,0,16,0,0,0,1,0,0,0,3,0,0,0,4,0,0,0,
                              1,69,120,112,84,0,0,0,16,0,0,0,1,0,0,0,2,0,0,0,0,0,0,0,
                              2,76,105,116,84,0,0,0,31,0,0,0,19,120,156,99,96,96,96,
                              100,96,96,224,110,206,102,96,143,204,47,45,200,172,203,
                              3,0,22,101,4,4,0,76,111,99,84,0,0,0,4,0,0,0,0,65,116,
                              116,114,0,0,0,40,131,108,0,0,0,1,104,2,100,0,3,118,115,
                              110,108,0,0,0,1,110,16,0,19,176,100,151,59,56,246,210,
                              238,75,130,233,80,230,122,37,106,106,67,73,110,102,0,0,
                              0,27,131,108,0,0,0,1,104,2,100,0,7,118,101,114,115,105,
                              111,110,107,0,5,56,46,48,46,51,106,0,68,98,103,105,0,0,
                              0,70,131,104,3,100,0,13,100,101,98,117,103,95,105,110,
                              102,111,95,118,49,100,0,17,101,114,108,95,97,98,115,
                              116,114,97,99,116,95,99,111,100,101,104,2,100,0,4,110,
                              111,110,101,108,0,0,0,1,100,0,13,100,101,116,101,114,
                              109,105,110,105,115,116,105,99,106,0,0,76,105,110,101,
                              0,0,0,20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0>>,
                            0,[]}

UPDATE: I will push the change to the following branch:
https://github.com/rabbitmq/khepri/tree/stored-procedures

0 replies

dumbbell · 2022-02-04T16:39:33Z

dumbbell
Feb 4, 2022
Maintainer Author

The stored_procedures branch now has a working prototype of triggers. Here is an example from the testsuite:

Store a procedure in the database:

%% ModifiedPath is the tree node path which was created, updated or removed.
%% OnAction is the action performed on that path: created | updated | removed.
my_procedure(#{path := ModifiedPath, on_action := OnAction} = _Props) ->
    % ... Do something
end.

khepri_machine:put(
  StoredId,
  [my_procedure], % Where to store the procedure.
  #kpayload_sproc{sproc = fun my_procedure/1}).

Register a trigger:

%% We only want to trigger the stored procedure after any modification
%% to the path `[foo]'.
EventFilter = #kevf_tree{path = [foo]},

khepri_machine:register_trigger(
  StoreId,
  TriggerId,
  EventFilter,
  [my_procedure]))}.

Create the [foo] node:

khepri_machine:put(
  StoredId,
  [foo],
  #kpayload_data{data = value}).

The stored procedure is called like this:

my_procedure(#{path => [foo], on_action => created}).

The trigger itself is also stored in the state machine's state and thus will be replicated and survive a restart of the Ra node.

The procedure runs on the Ra leader node, in the context of a new khepri_event_handler registered process. There is no guaranty it will run exactly once. It could run multiple times if the leader changes and the command which triggered the event is applied again, or it could not run at all if the node stops after the command was applied but before khepri_event_handler has a chance to execute.

0 replies

dumbbell · 2022-02-08T14:57:35Z

dumbbell
Feb 8, 2022
Maintainer Author

The latest commits to the branch hopefully brought at-least-once execution guaranties. It means that when a stored procedure is triggered by an event, that fact is recorded in the state machine's state in addition to the message sent to the event handler process. When the event handler process is done with the execution, it acks the execution to the state machine.

When a new leader is elected, it sends unacked triggered stored procedures to the event handler again. This means that the triggered stored procedures might be executed multiple times, and thus should be idempotent. However we are sure that we don't skip one because the event handler didn't have a chance to run them because the node went down.

Thank you @kjnilsson for the help on designing this!

I'm happy with the state of this new feature. I now need to extend the currently limited testing.

0 replies

dumbbell · 2022-02-08T17:16:50Z

dumbbell
Feb 8, 2022
Maintainer Author

I opened a draft pull request (#47).

0 replies

dumbbell · 2022-02-10T15:10:27Z

dumbbell
Feb 10, 2022
Maintainer Author

The pull request #47 was merged into main, closing this feature request!

0 replies

dumbbell · 2022-02-18T16:13:19Z

dumbbell
Feb 18, 2022
Maintainer Author

Here is a feedback from the Erlang forum I find interesting:

Have you considered maybe supporting callbacks and/or notifications as well as Funs? e.g.
{callback, Mod :: atom(), Fun :: atom()}
{notify, Method :: call | cast | info, Pid :: pid()}

I filed #57 to track this.

Another one on the Erlang mailing-list:

Triggers in an RDBMS database are a nightmare. I never use trigger in MySQL / PostgreSQL / etc RDBMS.
I worried the trigger feature in Khepri will also become a nightmare. Please reconsider adding a trigger feature.

I started the conversation with this person to explain my intent and try to better understand his concerns. This should become an improvement to the code or to the documentation.

Update: The person feels reassured by the explanation. I filed #58 to make sure I improve the documentation to prevent confusion with triggers in RDBMS.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stored procedures & triggers in Khepri #16

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 9 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Stored procedures & triggers in Khepri #16

dumbbell Nov 29, 2021 Maintainer

Events

Stored procedures

Link events to stored procedures

Example

Subscribe to events

Feedback?

Replies: 9 comments · 2 replies

mkuratczyk Nov 30, 2021 Maintainer

dumbbell Nov 30, 2021 Maintainer Author

dumbbell Nov 30, 2021 Maintainer Author

lhoguin Nov 30, 2021 Maintainer

dumbbell Nov 30, 2021 Maintainer Author

dumbbell Dec 8, 2021 Maintainer Author

dumbbell Feb 4, 2022 Maintainer Author

dumbbell Feb 8, 2022 Maintainer Author

dumbbell Feb 8, 2022 Maintainer Author

dumbbell Feb 10, 2022 Maintainer Author

dumbbell Feb 18, 2022 Maintainer Author

dumbbell
Nov 29, 2021
Maintainer

Replies: 9 comments 2 replies

mkuratczyk
Nov 30, 2021
Maintainer

dumbbell Nov 30, 2021
Maintainer Author

dumbbell
Nov 30, 2021
Maintainer Author

lhoguin
Nov 30, 2021
Maintainer

dumbbell Nov 30, 2021
Maintainer Author

dumbbell
Dec 8, 2021
Maintainer Author

dumbbell
Feb 4, 2022
Maintainer Author

dumbbell
Feb 8, 2022
Maintainer Author

dumbbell
Feb 8, 2022
Maintainer Author

dumbbell
Feb 10, 2022
Maintainer Author

dumbbell
Feb 18, 2022
Maintainer Author