-
Notifications
You must be signed in to change notification settings - Fork 43
Temporal Databases
v0.9.6 adds support for period-stamped valid- and transaction-time temporal tables to Links.
Here's a whistle-stop tour of temporal database concepts to contextualise the patch. For further information, see http://www2.cs.arizona.edu/~rts/tdbbook.pdf .
Temporal databases model how data changes over time. There are two main dimensions:
-
Transaction time, which concerns the state of the database (e.g., "what was the contents of the 'employees' table last week?")
-
Valid time, which concerns the data being modelled (e.g., "when did Simon stop being a PhD student?")
The goal is to add support for both of these dimensions of time to Links' language-integrated query functionality.
There are a few ways of keeping track of this information, but the one we choose is period-stamping. Here, each temporal table is extended with "from" and "to" fields:
name | position | valid_from | valid_to |
---|---|---|---|
alice | teacher | 2013-03-20 00:00:00 | 2022-01-01 00:00:00 |
bob | teacher | 2022-01-17 00:00:00 | infinity |
(Note that the intervals are closed-open: they include the valid_from
field, and exclude the valid_to
field).
Imagine this is a valid time table. We define the table as follows:
var employees =
table "employees" with (name: String, position: String)
using valid_time(valid_from, valid_to) from db;
Each row is then endowed with period-stamping information, so:
query nested {
for (x <-v- employees)
[ x ]
}
has type ValidTime((name: String, position: String))
.
We can project the data out using the vtData
function, and the from
and to
times using vtFrom
and vtTo
respectively.
We can get the current state of the database using the ttCurrent
and vtCurrent
functions (these are defined in the Prelude). Example:
query nested {
for (x <- vtCurrent(employees))
[ x ]
}
has type (name: String, position: String)
.
(We can also timeslice at a particular point in time using ttAt
and vtAt
).
Insert, update, deletion operators will then modify these periods. As an example, if we had a valid-time table and wanted to fire Bob, and it was 10AM on the 17th of January, then we'd write
delete (x <-v- employees)
where (x.name == "bob")
and this will end up with the database state:
name | position | from | to |
---|---|---|---|
alice | teacher | 2013-03-20 00:00:00 | 2022-01-01 00:00:00 |
bob | teacher | 2022-01-17 00:00:00 | 2022-01-17 10:00:00 |
Similar things happen with insertions and updates.
The above refers to "current" updates, which mean "alter the current state of the database". There are two other dimensions of DB inserts/updates/deletions:
-
Sequenced: an update over a time period. For example, seconding someone to a different post between January and March (sequenced update) or having them take unpaid leave (sequenced deletion)
-
Nonsequenced: where the time period is explicitly altered (for example, extending someone's contract, or correcting their start date).
We support all of these.
This patch adds support for:
-
Transaction-time and valid-time tables using a period-stamped representation (i.e., with explicit "from" and "to" fields)
-
Insert, update, and delete operations on temporal tables
-
Querying temporal tables
-
Homogeneous temporal joins (e.g., joining two transaction time tables)
The core concepts are as follows:
-
Tables / update operations now optionally carry a "temporality" argument (either Current, Transaction, or Update). Tables also carry the names of their fields.
-
The
query/temporalQuery.ml
file contains all of the rewrites on inserts / updates / deletions -
Internally, the
ValidTime
andTransactionTime
types are represented as records of the form(!data: a, !from: DateTime, !to: DateTime)
. The accessors are implemented explicitly rather than via a desugaring pass to ensure that the translation is type-preserving. -
query/query.ml
augments query results with the from- and to- times for temporal tables. -
Temporal joins are achieved as a post-processing pass after query normalisation (the translation is actually quite easy after normalisation)
Naturally all of the usual things (Sugared AST, traversals, typechecker, IR, IR traversals, value representation) need to be changed as well.
You can see all of the features in action in the tests: tests/shredding/{validtime, transactiontime, vt_join}.links
.
- New keywords:
vt_insert
,tt_insert
,valid
,to
,TemporalTable
- The
TableHandle
type is nowTemporalTable
which contains a temporality (eitherCurrent
,Transaction
, orValid
).TableHandle(r, w, n)
is now an alias forTemporalTable(Current, r, w, n)
.
-
At the moment, the temporal functionality only works with PostgreSQL. This is because Postgres is the only DBMS we support that allows an
infinity
timestamp. -
We do not currently support bitemporal tables (i.e., tables which have both valid time and transaction time functionality). This is the next big feature we need to design and think about.
-
Valid time sequenced inserts are limited -- they currently do not attempt to do any clever "gap filling", nor do they do any in-SQL checking that they respect primary keys. Therefore, an invalid VT sequenced insert (for example, one where the period of validity overlaps an existing row) will fail at runtime.
-
Temporal joins can only currently be flat (i.e., result types cannot contain any further collections). Shredded temporal joins are entirely nontrivial and a research project of their own.
-
We only currently support a period-stamped representation. It would be interesting to also support other temporal partitioning strategies (preferably in a uniform way).