Skip to content

Temporal Databases

Simon Fowler edited this page Feb 4, 2022 · 1 revision

Temporal Language-Integrated Query

v0.9.6 adds support for period-stamped valid- and transaction-time temporal tables to Links.

Temporal databases?

Here's a whistle-stop tour of temporal database concepts to contextualise the patch. For further information, see http://www2.cs.arizona.edu/~rts/tdbbook.pdf .

Key concepts

Temporal databases model how data changes over time. There are two main dimensions:

  • Transaction time, which concerns the state of the database (e.g., "what was the contents of the 'employees' table last week?")

  • Valid time, which concerns the data being modelled (e.g., "when did Simon stop being a PhD student?")

    The goal is to add support for both of these dimensions of time to Links' language-integrated query functionality.

Period stamping

There are a few ways of keeping track of this information, but the one we choose is period-stamping. Here, each temporal table is extended with "from" and "to" fields:

name position valid_from valid_to
alice teacher 2013-03-20 00:00:00 2022-01-01 00:00:00
bob teacher 2022-01-17 00:00:00 infinity

(Note that the intervals are closed-open: they include the valid_from field, and exclude the valid_to field).

Imagine this is a valid time table. We define the table as follows:

var employees =
  table "employees" with (name: String, position: String)
  using valid_time(valid_from, valid_to) from db;

Each row is then endowed with period-stamping information, so:

query nested {
    for (x <-v- employees)
    [ x ]
}

has type ValidTime((name: String, position: String)).

We can project the data out using the vtData function, and the from and to times using vtFrom and vtTo respectively.

We can get the current state of the database using the ttCurrent and vtCurrent functions (these are defined in the Prelude). Example:

query nested {
    for (x <- vtCurrent(employees))
    [ x ]
}

has type (name: String, position: String).

(We can also timeslice at a particular point in time using ttAt and vtAt).

Insert, update, deletion operators will then modify these periods. As an example, if we had a valid-time table and wanted to fire Bob, and it was 10AM on the 17th of January, then we'd write

delete (x <-v- employees)
  where (x.name ==  "bob")

and this will end up with the database state:

name position from to
alice teacher 2013-03-20 00:00:00 2022-01-01 00:00:00
bob teacher 2022-01-17 00:00:00 2022-01-17 10:00:00

Similar things happen with insertions and updates.

Current, Sequenced, Nonsequenced

The above refers to "current" updates, which mean "alter the current state of the database". There are two other dimensions of DB inserts/updates/deletions:

  • Sequenced: an update over a time period. For example, seconding someone to a different post between January and March (sequenced update) or having them take unpaid leave (sequenced deletion)

  • Nonsequenced: where the time period is explicitly altered (for example, extending someone's contract, or correcting their start date).

We support all of these.

Feature Overview

This patch adds support for:

  • Transaction-time and valid-time tables using a period-stamped representation (i.e., with explicit "from" and "to" fields)

  • Insert, update, and delete operations on temporal tables

  • Querying temporal tables

  • Homogeneous temporal joins (e.g., joining two transaction time tables)

Implementation

The core concepts are as follows:

  • Tables / update operations now optionally carry a "temporality" argument (either Current, Transaction, or Update). Tables also carry the names of their fields.

  • The query/temporalQuery.ml file contains all of the rewrites on inserts / updates / deletions

  • Internally, the ValidTime and TransactionTime types are represented as records of the form (!data: a, !from: DateTime, !to: DateTime). The accessors are implemented explicitly rather than via a desugaring pass to ensure that the translation is type-preserving.

  • query/query.ml augments query results with the from- and to- times for temporal tables.

  • Temporal joins are achieved as a post-processing pass after query normalisation (the translation is actually quite easy after normalisation)

Naturally all of the usual things (Sugared AST, traversals, typechecker, IR, IR traversals, value representation) need to be changed as well.

You can see all of the features in action in the tests: tests/shredding/{validtime, transactiontime, vt_join}.links.

Breaking Changes

  • New keywords: vt_insert, tt_insert, valid, to, TemporalTable
  • The TableHandle type is now TemporalTable which contains a temporality (either Current, Transaction, or Valid). TableHandle(r, w, n) is now an alias for TemporalTable(Current, r, w, n).

Limitations / Future Work

  • At the moment, the temporal functionality only works with PostgreSQL. This is because Postgres is the only DBMS we support that allows an infinity timestamp.

  • We do not currently support bitemporal tables (i.e., tables which have both valid time and transaction time functionality). This is the next big feature we need to design and think about.

  • Valid time sequenced inserts are limited -- they currently do not attempt to do any clever "gap filling", nor do they do any in-SQL checking that they respect primary keys. Therefore, an invalid VT sequenced insert (for example, one where the period of validity overlaps an existing row) will fail at runtime.

  • Temporal joins can only currently be flat (i.e., result types cannot contain any further collections). Shredded temporal joins are entirely nontrivial and a research project of their own.

  • We only currently support a period-stamped representation. It would be interesting to also support other temporal partitioning strategies (preferably in a uniform way).