Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subselect clause under optional clause fails in RDFlib #2957

Open
floresbakker opened this issue Oct 28, 2024 · 9 comments
Open

Subselect clause under optional clause fails in RDFlib #2957

floresbakker opened this issue Oct 28, 2024 · 9 comments

Comments

@floresbakker
Copy link

floresbakker commented Oct 28, 2024

A subselect clause under an optional clause fails in RDFlib. Let us consider the following example:

prefix ex:  <https://www.example.org/>

ex:document ex:subject "Nice cars" .

ex:someCar   ex:type "Car" .

I want to be able to query this graph and put all cars in some document about cars, if there are any cars. I use a subselect clause under an optional clause. The real reason for the subselect is that I want to count the number of cars (optionally being zero), but that is not relevant here. The example code is simple, does not involve an aggregate, but fails to yield the correct results.

prefix ex:  <https://www.example.org/> 

select ?subject ?car 

where  {
        $this ex:subject ?subject.
 
        # optional clause
        optional 
        { 
         # an offending subselect clause
         select ?car 
         where {
                   ?car ex:type "Car".       
               }
          }
    }

I would expect the following results:

subject: "Nice cars"
car: https://www.example.org/someCar

Instead I get:

subject: None
car: https://www.example.org/someCar

So the variable subject is incorrectly not bound in the result set.

If I would query with the optional clause and an ordinary triple pattern without a subselect clause, then I get the results I want:

prefix ex:  <https://www.example.org/> 

select ?subject ?car 

where  {
        $this ex:subject ?subject.

        # optional clause
       optional 
        
        {
        # legit triple pattern without subselect 
        ?car ex:type "Car".       
        }
      }

Expected and achieved results:

subject: "Nice cars"
car: https://www.example.org/someCar

I have tested the offending subselect query under the following engines:

RDFlib, JENA, Speedy and Virtuoso. Only RDFlib results in the 'none' binding for subject, the three other engines show the expected results.

@floresbakker
Copy link
Author

Is there anything I can do to get this issue fixed? I hope my example is clear but if not, let me know and I ll try to explain it better.

@WhiteGobo
Copy link
Contributor

For your example query a workaround would be, to rearrange the search order:

prefix ex:  <https://www.example.org/>
    
select ?subject ?car
            
where  {
    
        # optional clause
        optional
        { 
         # an offending subselect clause
         select ?car
         where {
                   ?car ex:type "Car".
               }
          }
        $this ex:subject ?subject.
    }

I searched a little bit for an error and found out, that the processor kicks out the solution for ?subject around evalProject. evalProject processes the subselect, i think.

def evalProject(ctx: QueryContext, project: CompValue):
res = evalPart(ctx, project.p)
return (row.project(project.PV) for row in res)

I wont get any further today. I hope this helps a little bit. I dont know, when i'm continuing this.

@floresbakker
Copy link
Author

Thanks WhiteGobo! In my opinion this is a high impact issue that lowers the reliability of the SPARQL engine considerably. I often write queries with subselect queries, hence this issue worries me. Thank you for your analysis!

@floresbakker
Copy link
Author

Any progress on this? Is there something I can do to help? I would love to create a release for my open source project OntoReSpec but then a fix would be needed for this bug. Let me know what I can do.

@WhiteGobo
Copy link
Contributor

oh sure. i think i should be able to find the exact problem and a solution today. Im having problem to motivate myself because of the creation of a PR. so if you could handle that part for me that would be great.

@floresbakker
Copy link
Author

That would be awesome. I am still learning how development is done in a public repository, so creating a PR makes my knees buckle a bit, I have to admit ;-) I read the RDFLib developers guide just now. Before I make a stupid move, I wanted to ask you whether my understanding of the PR creation is correct:

  1. Create a fork of RDFlib

https://github.com/floresbakker/rdflib
(already done, as this does not hurt anyone)

  1. Create PR
  • Write stuff that explains the issue to be solved
  • Connect the PR with the issue

@ajnelson-nist
Copy link
Contributor

@floresbakker , you have the right idea. If it helps you plan your writing, this file is the starter template. Its source will populate into the Pull Request's freetext submission box on GitHub.

@floresbakker
Copy link
Author

Thanks ajnelson-nist! I hope I did well, here it is #3077.

@WhiteGobo
Copy link
Contributor

WhiteGobo commented Feb 24, 2025

The return value of evalLeftJoin is the real culprit.
Changing line 184 in rdflib/rdflib/plugins/sparql/evaluate.py should do the trick

                 yield b 

->

                yield b.merge(a)

I think that merge is just missing. I couldn't quite get my head around it. link:

def evalLeftJoin(
ctx: QueryContext, join: CompValue
) -> Generator[FrozenBindings, None, None]:
# import pdb; pdb.set_trace()
for a in evalPart(ctx, join.p1):
ok = False
c = ctx.thaw(a)
for b in evalPart(c, join.p2):
if _ebv(join.expr, b.forget(ctx)):
ok = True
yield b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@WhiteGobo @ajnelson-nist @floresbakker and others