The Qanary Framework is dedicated to creating Question Answering systems. Question Answering (QA) is a task requiring different fields leading to expensive/time-consuming engineering tasks that might block research as it is too expensive. Typical problems/use cases that might occur while developing a Question Answering system are:
- an algorithm requires analyzing textual questions and annotating the found entities, relations, classes, etc.
- it is time-consuming as many services/algorithms/tools need to be compared
- your QA process needs to be improved
- following traditional development approaches requires additional efforts for testing and debugging of code to uncover possible flaws
- the quality of components dedicated to a particular task needs to be analyzed
- it is expensive to integrate all of the particular components due to a missing generalized interface
In this repository, the components of the Qanary framework are stored. All components are implemented in Java and provide a Docker container for lightweight maintenance.
To show the Qanary methodology and its functionality a tiny template-based Question Answering system was designed. It is capable of answering questions for the real name of a superhero like "What is the real name of Captain America?". For this purpose, just two components were used:
a) Qanary DBpedia Spotlight component: The component is capable of finding superhero names and linking it to the DBpedia knowledge base (such a process is called Named Entity Recognition and Disambiguation).
b) Qanary Query Builder for Superhero Names: The component is capable of creating SPARQL SELECT queries to be executed on DBpedia (such a component is typically called Query Builder) if the given question is following the template What is the real name of <superheroname>
.
Hence, given a question following the described pattern the result will be a SPARQL query that might be executed, s.t., the real name of a superhero is retrieved from DBpedia.
- Install the Qanary core components
- Clone the current repository:
git clone https://github.com/WDAqua/Qanary-question-answering-components.git
- Switch to the folder
Qanary-question-answering-components
:
cd Qanary-question-answering-components
- Build the minimal set of components using the Maven profile "tinytutorial" (here we skip creating the corresponding Docker images by adding the parameter
-Ddockerfile.skip=true
to the Maven command):
mvn clean package -Ddockerfile.skip=true -P tinytutorial
* The output should look like the following indicating that the component `qa.NED-DBpedia-Spotlight``and `qanary_component-QB-SimpleRealNameOfSuperHero` was created:
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] qa.NED-DBpedia-Spotlight 2.1.0 ..................... SUCCESS [ 3.717 s]
[INFO] qanary_component-QB-SimpleRealNameOfSuperHero 2.0.0 SUCCESS [ 1.083 s]
[INFO] mvn.reactor 0.1.1-SNAPSHOT ......................... SUCCESS [ 0.073 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
- Now, both components might be started using the JAR files:
java -jar qanary_component-NED-DBpedia-Spotlight/target/qa.NED-DBpedia-Spotlight-X.Y.Z.jar
java -jar qanary_component-QB-SimpleRealNameOfSuperHero/target/qanary_component-QB-SimpleRealNameOfSuperHero-X.Y.Z.jar
-
While having installed the Qanary components and Qanary pipeline using the standard configuration you can access a trivial Question Answering frontend via http://localhost:8080/startquestionansweringwithtextquestion
- Use the question "What is the real name of Captain America?".
- The question can be answered using the given two components.
- Thereafter, the triplestore will hold a SPARQL query that was created by the QueryBuilder component
SimpleRealNameOfSuperHero
(for DBpedia). It could be used to retrieve the actual answer from DBpedia. The UI shows the graph ID where the computed information was stored.- Retrieve the SPARQL query from your Qanary triplestore using:
PREFIX oa: <http://www.w3.org/ns/openannotation/core/>
PREFIX qa: <http://www.wdaqua.eu/qa#>
SELECT *
FROM <ADD-YOUR-GRAPH-ID-HERE>
WHERE {
?s a qa:AnnotationOfAnswerSPARQL.
?s oa:hasBody ?sparqlQueryOnDBpedia .
?s oa:annotatedBy ?annotatingService .
}
- Qanary provides the methodology for a knowledge-driven, vocabulary-based approach. Our long-term agenda is to create a knowledge-driven ecosystem for the field of Question Answering. It is part of the WDAqua project where Question Answering systems are researched and developed.
- Qanary Framework provides the core framework for creating Question Answering systems following the Qanary methodology. You might consider the Qanary Framework as a reference implementation of the Qanary framework as a microservice-based component architecture.
- Qanary components is covering the QA components compatible with the Qanary framework.
- Frankenstein is a supporting framework to establish a toolset for rapid orchestration and benchmarking of Qanary components. For example, it provides the tools to create from 29 components 380 QA systems.
Regarding questions, ideas, or any feedback related to Qanary please do not hesitate to contact the core developers. However, if you would like to see a QA system originally built using the Qanary framework, one of our core developers has built a complete end-to-end QA system that allows you to query several RDF data stores: http://wdaqua.eu/qa.
Please go to the GitHub Wiki page of the Qanary repository to get more insights on how to use this framework, how to add new components etc.
Kuldeep Singh, Andreas Both, Dennis Diefenbach, Saeedeh Shekarpour: Towards a Message-Driven Vocabulary for Promoting the Interoperability of Question Answering Systems. ICSC 2016: 386-389 DOI 10.1109/ICSC.2016.59
Andreas Both, Dennis Diefenbach, Kuldeep Singh, Saeedeh Shekarpour, Didier Cherix, Christoph Lange: Qanary - A Methodology for Vocabulary-Driven Open Question Answering Systems. ESWC 2016: 625-641 DOI 10.1007/978-3-319-34129-3_38
Dennis Diefenbach, Kuldeep Singh, Andreas Both, Didier Cherix, Christoph Lange, Sören Auer: The Qanary Ecosystem: Getting New Insights by Composing Question Answering Pipelines. ICWE 2017: 171-189 DOI 10.1007/978-3-319-60131-1_10
For further publications please see the following wiki page.
The following components are contained in the
It uses rule-based grammar to extract entities in a text.
Stanford named entity recognizer is an open-source tool that uses Gibbs sampling for information extraction to spot entities in a text.
is a multilingual, graph-based approach that uses random walks and the densest subgraph algorithm to identify and disambiguate entities present in a text.
It is a graph-based disambiguation tool that couples the HITS algorithm with label expansion strategies and string similarity measures to disambiguate entities in a given text.
It is a web service that uses a vector-space representation of entities and using the cosine similarity, recognize and disambiguate the entities.
It matches terms in a given text with Wikipedia, \ie links text to recognize named entities. Furthermore, it uses the in-link graph and the page dataset to disambiguate recognized entities to its Wikipedia URIs.
- TextRazor (homepage) is a startup providing software that helps developers rapidly build text analytics into their applications.
- Dandelion (homepage) is a startup specialized in Semantics & Big Data.
- Ontotext (homepage) provides a complete set of Semantic Technologies enabling better content management, knowledge discovery, and semantic search.
- Ambiverse (homepage) is a spin-off from the Max Planck Institute for Informatics, which develops technologies to automatically understand, analyze, and manage Big Text collections.
- Meaningcloud (homepage) is a company based in New York City, that specializes in software for semantic analysis.
- It maps natural language relations to knowledge graph properties by using dependency parsing characteristics with adjustment rules.It then carries out a match against knowledge base properties, enhanced with word lexicon Wordnet via a set of similarity measures. It is an open source tool.
- Qanary ReMatch for RL
- It devises semantic-index-based representation of PATTY~\cite{DBLP:conf/emnlp/NakasholeWS12} (a knowledge corpus of linguistic patterns and its associated properties in DBpedia) and a search mechanism over this index with the purpose of enhancing relation linking task.
- Qanary RelationLinker2 for RL
- The disambiguation module (DM) of OKBQA framework provides disambiguation of entities, classes, and relations present in a natural language question.
- Qanary DiambiguationProperty for RL
- Natural Language Interfaces for the Web of Data ((NLIWOD) community group (https://www.w3.org/community/nli/) provides reusable components for enhancing the performance of QA systems. We utilise one of its components to build similar relation linking.
- Qanary RelNliodRel for RL
- This component is the combination of RNLIWOD and OKBQA disambiguation modules for relation-linking tasks.
- Qanary AnnotationofSpotProperty for RL
- NLIWOD Class Identifier is one among the several other tools provided by the NLIWOD community for reuse. The code for the class identifier is available on GitHub.
- Qanary ClsNliodCls for CL
- This component is part of the OKBQA disambiguation module.
- Qanary AnnotationofSpotClass for CL
- Template-based query builders are widely used in the QA community for SPARQL query construction. This component is similar to the existing template-based components.
- Qanary QueryBuilder for QB
- SINA is a keyword and natural language query search engine that is based on Hidden Markov Models for choosing the correct dataset to query. We decoupled the original implementation to get a query builder.
- Qanary SINA for QB