-
Notifications
You must be signed in to change notification settings - Fork 32
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OcrdPage: propagate inherited attributes and TextStyle #698
Comments
Trying to find a way to implement this in a clean way, I have difficulty with our --- a/ocrd_models/ocrd_models/ocrd_page_generateds.py
+++ b/ocrd_models/ocrd_models/ocrd_page_generateds.py
@@ -14387,7 +14387,7 @@ class TextRegionType(RegionType):
already_processed.add('secondaryLanguage')
self.secondaryLanguage = value
self.validate_LanguageSimpleType(self.secondaryLanguage) # validate type LanguageSimpleType
- value = find_attr_value_('primaryScript', node)
+ value = find_attr_value_('primaryScript', node) or self.parent_object_.primaryScript
if value is not None and 'primaryScript' not in already_processed:
already_processed.add('primaryScript')
self.primaryScript = value So instead of copying the existing |
If you can boil this down to a single |
It can be a single The problem with copying the whole generated |
Gotcha, yes, a
I tend to check the |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
PAGE-XML features an implicit inheritance relation between various elements of the hierarchy:
Page/TextStyle → TextRegion*/TextStyle → TextLine/TextStyle → Word/TextStyle → Glyph/TextStyle
TextRegion*/@production → TextLine/@production → Word/@production → Glyph/@production
Page/@primaryScript → TextRegion*/@primaryScript → TextLine/@primaryScript → Word/@primaryScript → Glyph/@script
Page/@secondaryScript → TextRegion*/@secondaryScript → TextLine/@secondaryScript → Word/@secondaryScript → Glyph/@script
Page/@primaryLanguage → TextRegion*/@primaryLanguage → TextLine/@primaryLanguage → Word/@language
Page/@secondaryLanguage → TextRegion*/@secondaryLanguage → TextLine/@secondaryLanguage → Word/@language
Page/@readingDirection → TextRegion*/@readingDirection → TextLine/@readingDirection → Word/@readingDirection
Page/@textLineOrder → TextRegion*/@textLineOrder
These relations are only documented and cannot be automatically implemented in a generated DOM. But their semantics are important, and it would make writing processors much easier if they would be implemented.
For example, if I want to know if the current segment belongs to a certain script, I'd currently have to:
@script
or@primaryScript
/@secondaryScript
)@primaryScript
etcThis is very hard to achieve with XPath (because disjunction/unions are only possible on nodesets, not on predicates). And with the DOM it requires a lot of code each time.
But we could facilitate this by simply propagating all inherited features during
.build()
– in a patchedocrd_page_generateds
. We already have the user methods mechanism for patching, and we could simply usebuildChildren
to propagate all of the above attributes (as a bottom up post-hook), because attributes of parents are built before those of children.But for
TextStyle
, it's more complicated: on all hierarchy levels except thePage
level,TextStyle
sorts after the logical children and thus is only built after they are built. Also, one would need to unify style attributes between levels (we usually haveTrue
,False
andNone
; so true/false from parents replaces none in children).The text was updated successfully, but these errors were encountered: