Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify grammar for implementing it in lutaml-model #56

Open
andrew2net opened this issue Jul 22, 2024 · 15 comments
Open

Simplify grammar for implementing it in lutaml-model #56

andrew2net opened this issue Jul 22, 2024 · 15 comments
Assignees
Labels
enhancement New feature or request

Comments

@andrew2net
Copy link

  1. An issue with Inheritance can be solved with a workaround, but if some element inherits from a definition that inherits from another definition, it becomes a problem. For example, FormattedString inherits from LocalizedStringOrXsAny, and relation/description inherits from FormattedString.
  2. Shale can't handle choice/group cases.
  3. ...
@opoudjis opoudjis added the enhancement New feature or request label Jul 22, 2024
@opoudjis opoudjis moved this to 🏗 In progress in Metanorma Jul 22, 2024
@opoudjis
Copy link
Contributor

We're going to get rid of FormattedString anyway, since we're using Basicdoc for all formatted text.

@opoudjis
Copy link
Contributor

And we will get rid of variant.

So

LocalizedString1 =
  # multiple languages and scripts possible: comma delimit them if so
  LocalizedStringAttrs,
  text

LocalizedString =
  LocalizedString1 |
  element variant { LocalizedString1 }+

LocalizedMarkedUpString1 =
  # multiple languages and scripts possible: comma delimit them if so
  LocalizedStringAttrs,
  TextElement+

LocalizedMarkedUpString =
  LocalizedMarkedUpString1 |
  element variant { LocalizedMarkedUpString1 }+
  
# Unlike UML, change type to format: type is overloaded
# Would be need if plain were default value and could omit the attribute
# Added LocalizedStringOrXsAny
FormattedString = 
  # attribute format { ( "plain" | "html" | "docbook" | "tei" | "asciidoc" | "markdown" ) }?,
  attribute format { ( "text/plain" | "text/html" | "application/docbook+xml" |
    "application/tei+xml" | "text/x-asciidoc" | "text/markdown" | "application/x-metanorma+xml" | text ) }?,
  LocalizedStringOrXsAny

LocalizedStringOrXsAny1 =
  # multiple languages and scripts possible: comma delimit them if so
  LocalizedStringAttrs,
  ( text | AnyElement )+

LocalizedStringOrXsAny =
  LocalizedStringOrXsAny1 |
  element variant { LocalizedStringOrXsAny1 }+

becomes

LocalizedString =
  # multiple languages and scripts possible: comma delimit them if so
  LocalizedStringAttrs,
  text

LocalizedMarkedUpString =
  # multiple languages and scripts possible: comma delimit them if so
  LocalizedStringAttrs,
  TextElement+

LocalizedStringOrXsAny =
  # multiple languages and scripts possible: comma delimit them if so
  LocalizedStringAttrs,
  ( text | AnyElement )+

@opoudjis
Copy link
Contributor

choice/group:

... ugh, unhappy about simplifying these, but:

FullNameType =
    name_abbreviation?,
    (( prefix*, forename*, formatted-initials?, surname, addition* ) | completeName ),
    biblionote*, variantname*

becomes

FullNameType =
    name_abbreviation?,
    prefix*, forename*, formatted-initials?, surname?, addition*, completeName?,
    biblionote*, variantname*

and

address =
  element address {
    # iso191606 TODO
    (street*, city, state?, country, postcode? ) | formattedAddress
}

becomes

address =
  element address {
    # iso191606 TODO
    street*, city?, state?, country?, postcode?, formattedAddress?
}

opoudjis added a commit that referenced this issue Jul 23, 2024
@opoudjis
Copy link
Contributor

the complexities are:

  • choice
  • more then 1 level inheritance
  • recursion

@opoudjis
Copy link
Contributor

Also getting rid of Name Variants:

orgname = element name { LocalizedString | NameWithVariants }

NameWithVariants =
  element primary { LocalizedString },
  element variant { NameWithVariants1 }*

NameWithVariants1 =
  attribute type { text },
  element primary { LocalizedString },
  element variant { NameWithVariants1 }*

This is recursive and complicates things unnecessarily, given we have multiple name strings. This hasn't been used anyway in Relaton. We will just have optional type on names.

@opoudjis
Copy link
Contributor

Changing bplace from choice/group

bplace = element place {
  text | ( bibliocity, biblioregion*, bibliocountry*)
}

to follow other elements in having an optional formattedplace element, which concatenates all the elements into a single element:

bplace = element place {
  bibliocity?, biblioregion*, bibliocountry*, formattedPlace?
}

cf. formattedAddress

This is a breaking change; so <place>Richmond, VA</place> becomes <place><formattedPlace>Richmond, VA</formattedPlace></place>, as an alternative to <place><city>Richmond</city><region>VA</region></place>

@opoudjis
Copy link
Contributor

Also flattening the choice/group in keyword,

bkeyword = element keyword {
    LocalizedString |
    (
       element vocab { LocalizedString },
       vocabid+
    ) |
    (
       element taxon { LocalizedString }+,
       vocabid+
    )
}

to

bkeyword = element keyword {
   element vocab { LocalizedString }?,
   element taxon { LocalizedString }+, 
   vocabid*
}

Another breaking change: if the keyword is just text (LocalizedString), it will now be encoded as vocab

@opoudjis
Copy link
Contributor

There is an instance of recursion in the grammar that I have introduced somewhat recently, and that I really do not want to get rid of:

organization = element organization { OrganizationType}

OrganizationType =
    orgname+, subdivision*, abbreviation?, uri*, org-identifier*, contact*, logo?

orgname = element name {
   attribute type { text }?,
   LocalizedString
}

subdivision = element subdivision { 
   attribute type { text }?,
   OrganizationType
}

That is, model organisation subdivisions as full organisations themselves; this proved necessary for JIS, as subdivisions had internal structure and could not be restricted to just name strings.

@opoudjis
Copy link
Contributor

I'm flattening LocalizedStringAttrs into other type definitions, because this:

LocalizedStringAttrs =
  # multiple languages and scripts possible: comma delimit them if so
  attribute language { text }?,
  attribute locale { text }?,
  attribute script { text }?

LocalizedMarkedUpString =
  # multiple languages and scripts possible: comma delimit them if so
  LocalizedStringAttrs,
  TextElement+

roledescription =
  element description { LocalizedMarkedUpString }

is supposedly beyond the ability of Shale to deal with, as two levels of indirection.

That Shale apparently cannot deal with

A = attribute B

C = {
  B
  text
}

D = { C }

is reprehensible, and I deserve an apology for it, @ronaldtse . I'm sure I won't get one.

@opoudjis
Copy link
Contributor

I've simplified the Relaton grammar as far as I'm prepared to. Over to you, @andrew2net

@opoudjis opoudjis moved this from 🏗 In progress to 👀 In review in Metanorma Jul 23, 2024
@opoudjis
Copy link
Contributor

In preparation, getting rid of variant as used within Metanorma

opoudjis added a commit to metanorma/metanorma-jis that referenced this issue Jul 24, 2024
@ronaldtse
Copy link
Contributor

There is an instance of recursion in the grammar that I have introduced somewhat recently, and that I really do not want to get rid of:

organization = element organization { OrganizationType}

OrganizationType =
    orgname+, subdivision*, abbreviation?, uri*, org-identifier*, contact*, logo?

orgname = element name {
   attribute type { text }?,
   LocalizedString
}

subdivision = element subdivision { 
   attribute type { text }?,
   OrganizationType
}

That is, model organisation subdivisions as full organisations themselves; this proved necessary for JIS, as subdivisions had internal structure and could not be restricted to just name strings.

This kind of recursion can always be modeled through relationships, ie. a flat structure for organizations then link them together using attributes or elements.

@opoudjis
Copy link
Contributor

opoudjis commented Jul 24, 2024

Org 1, Org 2, Org 3, Org 1 is child of Org 2?

... Ugly. I'll wait on what @andrew2net has to say.

@opoudjis
Copy link
Contributor

Make PureTextElement recursive, so that its instances of em, strong do not invoke instances outside of PureTextElement; and use PureTextElement more in Relaton grammar, where more full TextElement support is unneeded. @andrew2net will treat the full prose elements like Abstract, which do need full TextElement support, as Nokogiri XML rather than try to deal with them in Shale.

@opoudjis
Copy link
Contributor

In biblio.rnc, TextElement* is now restricted to abstract and formattedref. Everywhere else, including titles, is constrained to PureTextElement*

@ronaldtse ronaldtse changed the title Simplify grammar for implementing it in Shale Simplify grammar for implementing it in lutaml-model Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants