Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize free text search of RDF data #40

Open
dbooth-boston opened this issue Mar 4, 2019 · 14 comments
Open

Standardize free text search of RDF data #40

dbooth-boston opened this issue Mar 4, 2019 · 14 comments
Labels
query Extends the Query spec

Comments

@dbooth-boston
Copy link
Collaborator

Several RDF stores support free text search, but there's no standard way to do it.
Proposed by Kjetil Kjernsmo in W3C Graph workshop lightning talk: https://www.w3.org/Data/events/data-ws-2019/assets/lightning/KjetilKjernsmo.pptx

@afs
Copy link
Collaborator

afs commented Mar 8, 2019

That was considered for SPARQL 1.1. The problem is "standard". The WG felt that the expectation would be a single free text search language in the same way regex is the F&O regex language.

Regex impls are sufficiently converged that the difference across programming language are small, and are part of the runtime of many languages.

Free text was not then in the same state.
Only some SPARQL engines would be able to afford to implement free text so a library would be needed, yetthe libraries are not the "same" search language.

If the WG had to come up with that free text search language, the work item that would have squeezed out many of the other things in SPARQL 1.1, there being finite time and people to do the work.

A looser idea of "standard" where an access point to text search facilities, without exact definition is one possibility.

@dbooth-boston dbooth-boston changed the title Standardize free text search of RDF data SPARQL: Standardize free text search of RDF data Mar 10, 2019
@VladimirAlexiev
Copy link
Contributor

@VladimirAlexiev
Copy link
Contributor

https://www.w3.org/2009/sparql/wiki/Feature:FullText (shared by Axel Polleres)

@dbooth-boston dbooth-boston transferred this issue from w3c/EasierRDF Apr 3, 2019
@seralf
Copy link

seralf commented Apr 4, 2019

Hi one of the approach for handling this feature could be thinking about a standardized way to use external indexing services, by means of a specific usage of the SERVICE statement (#10).

For example for blazegraph:

this document seems to have a nice overview of various different implementations:
https://www.ida.liu.se/research/semanticweb/events/Material/ISWC2017-SemDataMgmtTutorial-Part4-Searching.pdf

@JervenBolleman JervenBolleman added the query Extends the Query spec label Apr 4, 2019
@afs afs changed the title SPARQL: Standardize free text search of RDF data Standardize free text search of RDF data Apr 5, 2019
@afs
Copy link
Collaborator

afs commented Apr 5, 2019

Removing "SPARQL: " on transferred issue.

@kjetilk
Copy link

kjetilk commented Apr 7, 2019

Indeed, this was hard to do in SPARQL 1.1, since it would require some survey work to understand the specific use cases.

However, I think the key is that nearly every web site and app has a search bar. I've seen a lot of architectural complexity added just to support that on backends. Some basic support seems to me to be very important, but the scope should be limited.

@seralf
Copy link

seralf commented Apr 9, 2019

Another approach to handle FTS is used in Halyard project:

SELECT ?subj ?pred
WHERE {
    ?subj ?pred "(search~1 algorithm~1) AND (grant ingersoll)"^^halyard:search
}

https://merck.github.io/Halyard/usage.html#cooperation-with-elasticsearch

in this case the actual FTS is done on literals externally (on an external Elasticsearch instance), similarly to blazegraph. The syntax seems rather different from the other ones cited before.

@jindrichmynarz
Copy link

Regarding the Halyard's full-text search syntax, with @asotona we're actually considering changing it to the more common and uniform SERVICE approach.

@afs
Copy link
Collaborator

afs commented Apr 15, 2019

Full Text Search is important. Let's find what to consolidate around as a new feature and what to leave flexible. While perfect functionality alignment isn't possible if system use external libraries but is there one "text search" feature which has some common syntax across systems?

We have two issues for better calling external functionality in "multiple returns from a function #6" and
"named argument #64".

Are they enough as a mechanism?

Or maybe FTS is so important it deserves special syntax , with URI to a specific FTS implementation (even if that syntax is calling the same mechanisms) or would different systems be giving a different URI to their capabilities and the user has to see this?

Rough example: TEXT (?literal, ?score) OF ( "search terms"; "limit"=10 ; score="50")

@jpcs
Copy link

jpcs commented May 28, 2019

The XQuery Fulltext extension is at it's core not specific to XML, and could easily be lifted wholesale into SPARQL. It took a lot of time and effort to develop - and it would be good not to duplicate this effort.

https://www.w3.org/TR/xpath-full-text-10/

@namedgraph
Copy link

My attempt to list triplestores with full-text search functionality:
https://github.com/AtomGraph/LinkedDataHub/wiki/Full-text-search-support-in-triplestores

@ktk
Copy link

ktk commented Dec 14, 2023

See comment from @hartig in this related issue: #193 (comment)

In the context of a tutorial that I gave a few years ago, I collected information about the full-text search features provided by several triple store vendors (BlazeGraph, Virtuoso, AllegroGraph, Stardog, GraphDB). The latest version of my slides with this information can be found at the following address, where slides 24 to 41 are the relevant ones.

@afs
Copy link
Collaborator

afs commented Dec 14, 2023

Apache Jena text search is based on Apache Lucene.

https://jena.apache.org/documentation/query/text-query.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
query Extends the Query spec
Projects
None yet
Development

No branches or pull requests

10 participants