-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keyword for functions that can produce multiple bindings #6
Comments
A very common use case for this is splitting a string on a delimiter. In Jena this can be done with the apf:strSplit property function:
This works, but the property function syntax is pretty obscure and is really an abuse of triple pattern syntax. So let us consider other syntax options. Extending the
The idea is to extend
|
+1000 This is one of my big bug bears with the current syntax and the abuse of it by magic properties. You definitely want to list out all the resulting variables so that it is clear what the output columns would be although you probably also want some way to have implicit outputs. A lot of existing "magic" properties are used to join extra info or filter the existing result set so the input variables are implicitly included in the output variables in many cases |
I am aware that some magic properties are supposed to work "in all directions", i.e. regardless of which combination on left/right side is present. I don't think this (complex) contract is often needed in practice, so I'd suggest we focus on a clear separation between input and output. |
Would allowing |
In current SPARQL, the argument to |
I think you would definitely need a different keyword to distinguish the existing row-in row-out semantics of |
Why? If the semantics of I'm more worried about other uses of expressions like |
i agree with the reasoning about reusing Wrt. to tuple asignments: how about something like |
would that not be just a matter of the respective function signatures? |
allegro provides one example as to how to approach this "signature" aspect of this issue. it permits a list of terms as both subject and object respective a functional predicate term. they call them "magic" properties. |
I am losing confidence in my proposal to extend Then how about extending a different existing keyword?
So the result is a solution sequence—just like with the current |
how about
? |
@dydra Did you read the issue? That is exactly what everyone is doing at the moment, and the point of this issue is that we want to get away from that abuse of triple pattern syntax. |
i did and i do not find it to be abuse. |
@dydra Property functions a.k.a. magic properties require selectively ignoring the SPARQL spec. To add, for example, an The syntax is inconvenient as it doesn’t allow expressions as arguments. There are conceptual problems with property functions too. Are the inputs on the left side and the outputs on the right, or vice versa? Which arguments must be fixed and which may be left variable? How does a SPARQL engine know whether to execute other parts of the graph pattern first in order to produce bound values for some of the variables? The |
Would someone care to create and populate a page on the wiki that records what existing approaches there are? Virtual graphs and use of SERVICE for accessing custom functionality fall under the multiple bindings concept. From an implementers POV, it would be very useful to know which are the input arguments (must be bound) and which are the output arguments (which must be bound by the call). |
if i understand that argument, it suggests that, to change the semantics of function invocation to permit multiple value return and/or a single invocation to yield a solution sequence is without problems, while to define the semantics for a form such as
or, in other words,
would somehow conflict with 18.7 . while multiple values are very useful and would serve the language well, were they to be supported, the changes which would be required to Expression to permit solution sequences as results - and thereby as arguments, appear at first glance to be much more complex than a definition for bgp-based invocation which conforms to 18.7 . |
@dydra I didn't say that the I didn't say that property functions conflict with section 18.7, and if they do, I wouldn't know! I said that adding property functions to SPARQL would require changes to the handling of BGPs. 18.7 is one way of doing that. I am not advocating a change to SPARQL functions as used in expressions. I did so previously, but as I said further up in this thread, I no longer think it's a good idea. I am (at least today) advocating this form:
The |
ok. i remain concerned.
yield? |
The “function call” of a That is also the only place where an SSP is allowed, so |
ok, no problem. |
@jindrichmynarz So that would be:
I added the
|
This is an high level outline to test how to proceed. There is a connection to issue #10 (Generalize SERVICE for non-SPARQL endpoints) and others. There could be a single mechanism for extensions named by URI that takes arguments and returns a result table. There are important details to work on such as whether and how the callout declares its "in"and "out" parameters and whether the "in" parameters must be. There may be specific syntax for important cases to make the mechanism easier to use for those cases and aid engines providing only certain common cases. #6 - Keyword for functions that can produce multiple bindings |
in this illustration
the specific example is without context. |
@dydra I imagine it would be like Thinking out loud: Another useful form might be
|
Referencing #14, could the syntax be something like values (?company ?industry) of apf:strSplitParallel ((?companies,",") (?industries,",")) |
@VladimirAlexiev Why the nesting in the arguments? It is complicated, and at least for this use case not needed. I think SPARQL's existing function call syntax is sufficient:
Must have an even number of arguments. Arguments 1, 3, etc. are the strings to split. 2, 4, etc. are the separators for the preceding argument. |
Implementation experience - knowing the signature of the function helps optimization. For this feature, the undeclared output variables would make things like putting filters in the optimal place in the execution harder - that's just one example and there are other possible optimizations. There 3 cases from my implementation experience:
rdfs:seeAlso #44. |
Sometime multi-functions take a lot of argument, some of which are optional. One trick is that the argument list is variable length but that does not completely work because it forces the order of the input variables and that might be confusing. Also, to set a later one all the earlier ones have to be defined to get the position right. One way is to allow Named arguments are more readable:
The parsing details need further work and depend on how complicated to make the parsing technology required. |
@afs I was thinking that the list of variable names in the So, a mechanism for defining multi-functions (e.g., a Java interface Such an interface for implementing multi-functions could also require implementations to announce whether each variable is always bound or might be unbound, if that is useful information for the query optimiser. That would be an implementation issue. So, the following would not be allowed, because the list of variables (in the first row of the CSV file) depends on the evaluated argument, and hence is not yet known at query build time:
But the following would be allowed; the file name is known at query build time, and the function implementation can read the first line at that time to return the variable list:
The following would be allowed, because it's not
About adding named arguments in addition to positional arguments are worth considering, but if a syntax is introduced for them, then I don't see why that shouldn't also be allowed on normal functions and aggregates. As they are orthogonal to the kind of function, they should get their own issue. |
So is this understanding right? There is a phase during SPARQL execution where multi-bind functions declare their result variables so the SPARQL engine has access to a function signature for each multi-bind function instance. The information provided to the dynamic signature request includes the input argument as syntactically declared (based on the example of That would need some extra conditions:
but at first pass looks workable. Incidentally, that would relate to |
@afs That's correct. The spec would have to say that the output variables of |
FWIW for TopBraid we are adding a declarative mechanism for defining such functions: http://datashapes.org/multifunctions.html As shown it can be mapped to (Jena) property functions, yet some official syntax would be much cleaner in my opinion. |
given the invoking form
why is the declaration necessary? is it not sufficient to just extablish that syntax for multiple value returns? |
I don't understand your question. How could a SPARQL processor know what it needs to do without a declaration mechanism first? Hard-code ex:namedSuperClasses? All that a normal SPARQL engine will see is ex:namedSuperClasses is a predicate so it will look up matches in the graph, and know nothing about the special meaning. |
it knows how many return values to accommodate because it has the invocation form. it needs a means with which to determine whether the term in the predicate position is intended to be unified with the graph or invoked as a function. the generation of the return values is a matter for the function definition. see. http://clhs.lisp.se/Body/m_multip.htm the only matter for sparql-1.2 is to define that if it is known that a term in the predicate position is a function, both a single object term and a list of object terms are valid and in the latter case, the processor must prepare for multiple return values. given the example, how is it intended that a processor interpret the following?
and
should they not be valid expressions? |
Apache Jena property functions are multi-values returns including zero. They are supposed to be like triple matching but also overload There is a consequence of this. The implementation must cope with variables in subject or object position and on return all variables must be bound (in order to be "triple match like"). A proper multiple returns facility needs to distinguish "in" and "out" variables. Distinguishing "required" and "optional" would also be helpful as would named arguments to make stored procedures with many arguments easier to work with. It could be by prior declaration or the syntax at the call point could provide this. Fully dynamic (no declaration either way, possible execution errors) is possible but would limit error checking and query optimization involving reordering near the stored procedure; I don't see advantages to fully dynamic. |
of the expressed considerations, this one would concern me. what information could be declared which would not be covered by that which is available in the lexical context of the invocation? |
an alterative is to unify: whatever values are bound in the context are passed. |
To be clear, the proposed design here is not to formalize magic properties. From my perspective, magic properties are a convenient hack that didn't break the SPARQL 1.0 syntax and there are some arguments in favor of mapping them to triple matches because when there is only one value on the left and right then they COULD be implemented by normal triples (e.g. as inferences). But I don't think the syntax is good, nor is the expectation that such functions can be executed in both directions always realistic. Thus, the proposal for Multi-Functions is meant as input to a future SPARQL syntax that doesn't use magic properties but a dedicated keyword. We use magic properties for the time being because there is no alternative that works with SPARQL 1.1 syntax. And yes I agree with Andy that these shouldn't be fully dynamic. In our framework, the multi-functions are installed ahead of time from dedicated graphs that the SPARQL engine knows about. This means that syntax checking (including sh:optional, sh:class, sh:datatype etc) could be done at edit time and parse time. It also shouldn't be required for the data graph to include the dash:MultiFunction declarations before they can be used. |
To add some use case; I'm adding some trickery in an RDF4J query optimizer pipeline to support stuff like:
to yield output:
Ideally also the following would work:
I guess this illustrates the multi output / flat map type of behaviour I was after. I guess this is a different use case from yielding multiple outputs from a function to be then be captured into multiple variables. This seems to have more similarity with tuple unpacking / destructuring in e.g. python, C# and javascript. For that use case a new syntax would be good. |
Currently, BIND assignments can produce at most one binding for a single variable.
Many SPARQL implementations have a notion of "special" or "magic" predicates that can be used to bind multiple variables at once, and to produce multiple rows at once. Examples include Jena's built-in magic properties/property functions (e.g. https://jena.apache.org/documentation/query/extension.html#property-functions) and various free-text search implementations that take a string as input and produce all subjects that have values. Further, some implementations even use the SERVICE keyword as a hack, e.g. SERVICE wikibase:label. From our own modest experience there are tons of use cases for such multi-binding functions.
Even if SPARQL 1.2 doesn't include any specific built-in multi-binding functions, it would be great to have a proper syntax and an extension point so that these special functions can be consistently identified, supported by syntax-directed editors etc.
I struggle to find a proper keyword, so as a placeholder I am using ITERATE in this proposal:
SELECT * WHERE { ... ITERATE (ex:myFunction(?inputA, ?inputB, ...) AS ?outputA, ?outputB, ...) . }
The above would produce an iterator of bindings evaluating whatever is implemented by ex:myFunction. The right hand side would have unbound variables that get bound in each iteration. It's similar to BIND otherwise.
As related work, SHACL-AF includes a mechanism to declare new SPARQL functions that wrap ASK or SELECT queries. Obviously this could be extended to turn SELECT queries into reusable "stored procedures" that can then be reused in other queries using the new keyword.
The text was updated successfully, but these errors were encountered: