You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I implement a dataset-level metadata extractor. I think I need to be able to report multiple, individual metadata records. In principle, one be able to build these records in a way that they can be reported in a nested fashion (thereby reporting just a single object). However, in my case I have no control over the nature of these documents, and they might be linked (or not) in different ways.
What is a desirable approach here?
an arbitrary top-level key that maps onto an array?
a JSON-LD style @graph top-level key (as a realization of the above)?
something else?
Related: We might be talking about a lot of stuff to return. If I see things correctly, I need to load multiple standalone records into memory (many), report them via immediate_data as a single dict, such that they can be written out as JSON (again). I am yet to understand why meta-extract turns a single return value of type ExtractorResult into a result record, rather than dealing with result records directly. This would make the standard machinery of seemlessly switching between return values and generator yields applicable to metadata extractors too
The text was updated successfully, but these errors were encountered:
@mih I didn't think about that yesterday afternoon, but another option would be to return a list, which contains the individual results, in immediate_data.
I implement a dataset-level metadata extractor. I think I need to be able to report multiple, individual metadata records. In principle, one be able to build these records in a way that they can be reported in a nested fashion (thereby reporting just a single object). However, in my case I have no control over the nature of these documents, and they might be linked (or not) in different ways.
What is a desirable approach here?
@graph
top-level key (as a realization of the above)?Related: We might be talking about a lot of stuff to return. If I see things correctly, I need to load multiple standalone records into memory (many), report them via immediate_data as a single dict, such that they can be written out as JSON (again). I am yet to understand why
meta-extract
turns a single return value of typeExtractorResult
into a result record, rather than dealing with result records directly. This would make the standard machinery of seemlessly switching between return values and generator yields applicable to metadata extractors tooThe text was updated successfully, but these errors were encountered: