-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functionality to guess the intended input file format #1058
Comments
For some other formats (after playing around with loading edge cases):
Also for TTL: this is a full, valid file
So we would need an edge case for that which looks for As I'm listing out the edge cases, I'm worried that we will miss something and this could cause a breaking change. It's a good idea to report a clean error message - but maybe we only should do it if all parsers fail? It seems inefficient but I don't know how we can be 100% sure we get all the edge cases. And then suddenly somebody can't load their ontology (I guess they could override with |
You would 100% do not want to change the default behaviour (no guessing) - just add a new option |
If you're setting the flag anyway, wouldn't you know your format? So why add auto-detect when you could just specify the format the same way? Can you see a use-case where you don't know the format in advance, but want clear error messages if it fails? Maybe dashboard stuff...? |
I agree with @beckyjackson. A possible use case is to say: we will do a better job guessing, rather than OWLAPI cycling through all possible parsers. But yeah, I still agree with you. |
Building on #1038.
PR #1056 is already very useful, but it made me wonder: Why don't we have a utility that would determine the file format, or at least make a good guess at what was intended? It could be added to #1056 as a
--input-format detect
or--input-format auto
option, and maybe used whenever parsing fails.The root problem is that a
.owl
extension is used for several different formats supported by OWLAPI. For most of the OBO use cases we expect RDF/XML, but it could be OWL/XML or Manchester or OWL Functional or Turtle. The OWLAPI will try a dozen different parsers until one of them works, and if it successfully loads then we can ask withOWLOntologyManager.getOntologyFormat()
. The interesting case is #1038 where the ontology fails to load but we should still be able to guess the intended format, and then report the most useful parsing error.When I'm not sure about the format, I just look at the first few lines of the file. It shouldn't be hard to write code for some crude heuristics. This would be useful even if it misses some weird edge cases.
<rdf:RDF>
(skip XML DTD stuff)@prefix
Prefix(
We could also have useful error messages for common failure modes:
The text was updated successfully, but these errors were encountered: