-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address Pitfalls of Numerical Datatypes in RDF #82
Comments
To make this issue more actionable, here a little more details, some thoughts about requirements and a solution sketch. Problem
For a more detailed description of the problem refer to The Problem with XSD Binary Floating Point Datatypes in RDF (talk recording). RequirementsA couple of requirements follows from these problems:
Solution DraftAs a basis for discussion I would like to propose the following (challenging/maybe unrealistic) list of changes to address the problem:
Compatibility ConsiderationsOld implementations with new data:
New implementations with old data:
Old implementations interacting with new/upgraded implementations:
This would of course not be the easiest change to the RDF standards, especially as it also touches the XML standards. But I think, it is important to address this to make RDF a reliable framework for the representation of numeric data. What do you think about it? (e.g. @afs, @VladimirAlexiev, @gkellogg, @namedgraph) |
@danbri might have an opinion :) |
There are a couple of issues with numerical datatypes that make the accurate use of RDF for numerical data error-prone.
The use of
xsd:float
andxsd:double
entails a risk"0.1"^^xsd:float
is typically mapped to the value0.1000000014901161
), andIn most cases,
xsd:decimal
would be a better choice:xsd:float
andxsd:double
are the appropriate datatypes for measurements. In my point of view, this only holds in case of measurements that origin from binary floating point sources (e.g. numeric calculations or outputs of analog-to-digital converters). Other measures typically have a value and the measurement uncertainty of the used measurement device, resulting in the representation by two precise values, which should both be represented withxsd:decimal
.Infinite
is required, which is only provided byxsd:float
andxsd:double
.The use of
xsd:decimal
for value representation does not considerably impede the use of floating point arithmetic for calculations (e.g. for performance reasons), as the conversion is trivial. In contrast, if a rounding of the lexical representation must be avoided, the other direction would require non standard-conform and (depending on the framework) probably cumbersome to implement custom lexical mappings, and is not always possible (e.g. inside of SPARQL queries).However, I don't see awareness for these issues in general and especially in teaching material.
Further, RDF unnecessarily inherits limitations from XSD: Exponential notation is only supported for
xsd:float
andxsd:double
, but not forxsd:decimal
(and derived datatypes). It was not included intoxsd:decimal
as the requirement was already meet with the precisionDecimal datatype, which however, did not become a built-in datatype in RDF. This tempts users to usexsd:double
even if not appropriated. The shorthand syntax in Turtle, TriG and SPARQL additionally amplifies this, asxsd:double
might be used even if not intended.(A more detailed discussion of the issues can be found in arXiv:2011.08077 and some reviewer comments on it.)
Possible Actions
I think the following actions would help to ease the accurate representation of numbers in RDF:
xsd:decimal
(and derived datatypes) in RDF.xsd:float
/xsd:double
resulting in rounded values after the lexical mapping.xsd:decimal
in favor ofxsd:float
andxsd:double
and to warn users if a lexicalxsd:float
orxsd:double
value was entered which would require rounding during the lexical mapping.xsd:decimal
instead ofxsd:double
.One to three would not cause any backward compatibility problems. Four however, would obviously cause backward compatibility problems ins software, but might at the same time increase the accuracy of value representations in existing RDF documents without change.
Further, one could think about adding mandatory support for
precisionDecimal
(to have an arbitrary precision datatype with a representation ofInfinite
), but that is a new feature and goes beyond making RDF easier.The text was updated successfully, but these errors were encountered: