Support Japanese numerals #228

Intelligent2013 · 2024-10-20T17:54:40Z

Source issue: #226

Support Japanese numerals in

clause numbers
Example:
ordered list items
Example:
edition number
currently, there are two elements in the Presentation XML:

<edition language="">1</edition>
<edition language="ja">第1版</edition>

publication date
Example: 令和元年七月二十二日
Current Presentation XML: <date type="published">令和元年7月22日</date>

If this task is complicated, then I'll find how to do this via XSLT extensions on Java.

@ronaldtse does we need to support two number formats - Arabic (1, 2, 3, ...) for usual documents and Japanese (一, ...) for vertical layout documents? Or only Japanese numbers?

Note: I don't know the reason, but the notes numbers should be Arabic:

UPDATE after the comment

@Intelligent2013 I just noticed this since @opoudjis raised it. They are meant to be in Japanese numerals too.

notes, examples numbers

The text was updated successfully, but these errors were encountered:

ReesePlews · 2024-10-21T00:26:49Z

very interesting to see the vertical layout. thanks for all the work on this @Intelligent2013 ! i dont work with vertical layout much but the third image above looks more correct than the second image. the layout of the kanji numbers in the first image appears correct for the main clause numbers, but with the sub-clause numbering, the vertical style of '三・一' etc seems different to me... i guess, in theory, that is the correct style but seems a bit difficult on the eyes; again i dont have enough experience with vertical layout. i suspect that vertical layout is widely used by such agencies as the justice ministry (法務省) and the writing of japanese laws/regulations. i know there is a large legal website that has japanese laws with english translations, but off hand i dont remember the link. they may have samples of printed works online that could be helpful in these cases.

ronaldtse · 2024-10-21T01:19:58Z

Thank you @ReesePlews ! Yes you are right that the Japanese "e-Gov" website has all the Japanese laws.

For example, this is the Constitution of Japan:

https://laws.e-gov.go.jp/law/321CONSTITUTION

For vertical layout, they have 3 options: 1 column, 2 columns and 4 columns

This is the law that establishes JIS:

https://laws.e-gov.go.jp/law/417M60000F00006

For space savings, this is a screenshot of the 4 column (so it's not too tall to show here).

It uses the list style:

1, 2...
一, 二, ...
イ, ロ, ...
(1), (2)...
(i), (ii)...

The list style only uses a single full width space indentation to separate list levels.

UPDATE: It seems that when Paragraphs are labeled, in the e-Gov website the paragraph label for the first paragraph is omitted, and subsequent paragraph labels exist. Not sure why the list item "1" is missing though. This doesn't seem to be an East Asian tradition.

Intelligent2013 · 2024-10-21T15:59:42Z

The 1st post updated - added 'edition number'.

opoudjis · 2024-10-22T11:20:47Z

There's two elements to this.

The first is to support Japanese numerals, and I can do that, sure: that's merely 2.localize(:ja).spellout, using twitter_cldr.

The second is to work out where to use Japanese numerals instead of Arabic numerals. This should not be being done on an ad hoc basis, and it should not be being done independently in HTML and PDF: there needs to be a rule as to where it happens, and it needs to be done in Presentation XML.

I have the bad feeling that this is going to end up as a document attribute.

ronaldtse · 2024-10-22T11:30:59Z

I have the bad feeling that this is going to end up as a document attribute.

You mean the specification of list bullet styles per level being configurable? I'd (everyone would) love that.

opoudjis · 2024-10-22T11:34:31Z

I don't even know if I can do that in HTML. Not without a lot of pain.

And you need to say a lot more about where Japanese numbers are meant to show up. Numbering is done in code; I can make the xref counter output Japanese instead of Arabic numerals, but that means initialising each counter instance in isodoc, one for every block type and clause (figures, tables, requirements, etc etc etc).

Without a coherent statement, you are not getting anything.

…metanorma-jis#228

ronaldtse · 2024-10-22T11:49:00Z

Note: I don't know the reason, but the notes numbers should be Arabic:

@Intelligent2013 I just noticed this since @opoudjis raised it. They are meant to be in Japanese numerals too.

opoudjis · 2024-10-22T11:59:23Z

You mean the specification of list bullet styles per level being configurable? I'd (everyone would) love that.

PER LEVEL?! No you are not getting random list level specification PER LEVEL. ISO HTML CSS has 30 lines of custom code just to insert ")" after list numbers.
metanorma/isodoc#247 has been unactioned for the past four years because of how horrible Word HTML is about custom list numbering.

No, what you're going to get is:

A document attribute specifying whether Japanese or Arabic auto-numbering is to be used in the document. I am not going to be supporting vague notions of new flavours or document types: I am yet to see evidence that there is a coherent mapping of Japanese numbering to document type or organisation at all, and I'm not going to wait for one.
Restriction of Japanese number styling to clauses, ordered lists, and edition numbers. Each and every numbering counter is a separate variable, and if any one of them outputs Arabic, they need to be set individually. I am not at this time going to assume that Japanese numbering is used for all autonumbering in the document, for the simple reason that the sample document does not, and it is not our place to dictate to people what numbers they use universally.

Ordered lists will rely on the Presentation XML feature of //ol/li/@label to tell the consumer what to put in the list. This will only work out of the box for PDF, and there is code from other flavours that can make it work for DOC; HTML would need CSS overriding to make it work.

I am considering this nothing more than a proof of concept.

opoudjis · 2024-10-22T12:00:35Z

I'm going to realise this with the document attribute

:presentation-metadata-japanese-numbering: true

opoudjis · 2024-10-22T12:15:02Z

@ronaldtse wants to generalise this to Arabic, Chinese, and Amharic.

I have little inclination to do so, and this does not address the very real problem of what types of block are going to be Arabic and what local.

But:

:presentation-metadata-autonumbering-style: japanese

The nightmare scenario is:

:presentation-metadata-notes-autonumbering-style: arabic
:presentation-metadata-clause-autonumbering-style: japanese
:presentation-metadata-subclause-autonumbering-style: arabic

I will not be implementing that.

opoudjis · 2024-10-22T13:07:07Z

To make counters more configurable, I'm going to eventually set up configuration of all counters—starting value and style. But for now, I'm only going to expose that for clauses and lists.

…anorma/metanorma-jis#228

opoudjis · 2024-10-22T14:22:00Z

I've got a problem: I want to assign config to counter classes based on config in the xref class (which knows about numbering styles from the Presentation XML metadata), but I don't want to redefine all the classes invoking them.

So to exploit inheritance, I'm going to have to define these counter classes with methods invoked from the xref class.

opoudjis · 2024-10-22T14:56:12Z

Not working yet...

Intelligent2013 · 2024-10-22T16:28:35Z

Also we need to support Japanese numerals in the publication date. I've updated the initial post.

opoudjis · 2024-10-23T00:49:48Z

I am providing Japanese numbering in the Presentation XML, but there is a nightmare scenario where you provide Japanese numbering for page numbers. If you do need them, and if XSL:FO is not clever enough to do that automatically, I'll need to dump the numbers 1–1,000 in the localization strings. Let's not action that yet though... I'd be surprised if XSL:FO doesn't provide that natively somewhere.

Intelligent2013 · 2024-10-23T09:12:50Z

I am providing Japanese numbering in the Presentation XML, but there is a nightmare scenario where you provide Japanese numbering for page numbers. If you do need them, and if XSL:FO is not clever enough to do that automatically, I'll need to dump the numbers 1–1,000 in the localization strings. Let's not action that yet though... I'd be surprised if XSL:FO doesn't provide that natively somewhere.

@opoudjis Apache FOP has the extension fox:number-conversion-features (https://xmlgraphics.apache.org/fop/2.0/complexscripts.html#source), but looks like it's not working at all, may be I try something wrong... For any case, let's dump the numbers 1–1,000 in the localization strings when you have a time. The page numbers changing should be applied in IF (Intermedia Format) after XSL-FO generation.

opoudjis · 2024-10-23T10:52:34Z

We need to localise the clause number delimiter, from half-width to full-width full stop, if Japanese numbering is used.

And I'm going to use this as the opportunity to implement a fix to CJK punctuation called on in relaton/relaton-render#52, which I have not implemented to date because of @ronaldtse ’s indefensible notion that

Johnson、 A。、 Peters、 B。 1976。 The origins of sound 【series】。 London〯Blackwells

is desirable punctuation.

It is not, I reject with utmost vehemence any claim that it is (and so has Reese) and I am pressing ahead with the correct solution.

Regardless of the document main language, punctuation localisation will convert punctuation from half-width to full-width only if at the characters on either side are CJK.

So:

All clause numbers will now be subject to punctuation localisation.
Regardless of the language of the document, a clause number like "2.1" will ABSOLUTELY NOT be converted to "2。1", because that is insane, and makes me look incompetent.
The clause number "二.一" will however be converted to "二。一", because the dot is surrounded by CJK characters.
Annex number "A.一" will not be converted to "A。一"

I am also going to bite the bullet and move Japanese number rendering to isodoc for xref counters; they already support Roman at top level.

…orma-jis#228

opoudjis · 2024-10-24T01:52:05Z

@Intelligent2013 The edition numbering works in testing, so I will need to investigate that. The list numbering will also be complicated.

opoudjis · 2024-10-24T01:57:18Z

Reese, the point of what I have written is the following:

Automated text generation in Metanorma uses Latin punctuation
Latin punctuation in CJK text needs to be switched to full-width punctuation, if it is automated text
But not if the Latin punctuation is adjacent to Latin text
If users actually want CJK punctuation inside Latin text (which Ronald seems to think they do), then it needs to be set as such in the outset: CJK punctuation will not be converted back to Latin
My use of "Code" is a random example. Try, more to the point:

二.二 => 二。二 ( although it looks like I will need to override this with middle-dot anyway)
A.2 => A.2 (unchanged; previously it would have attempted A。2)

ronaldtse · 2024-10-24T02:01:09Z

@opoudjis the Japanese "middle dot" delimiter is not the "full stop", they are different symbols.

ronaldtse · 2024-10-24T02:02:15Z

If users actually want CJK punctuation inside Latin text (which Ronald seems to think they do), then it needs to be set as such in the outset: CJK punctuation will not be converted back to Latin

No, that's not what I asked for. The default for bibliographic entries is to be rendered in a suitable style, i.e. English in English, Japanese in Japanese. We could have Japanese in English or English in Japanese but that should not be the default.

opoudjis · 2024-10-24T02:09:56Z

Bibliographic entries will routinely be mixed-language, with things like Japanese authors and English titles. The notion of a bibliographic entry being "just Japanese" or "just English" is naive and inflexible. It is also is a nuisance on top of trying to work out what the language of a bibliographic entry is to begin with. (You think users are going to be marking it up as [lang=ja]? And then mark up titles individually as exceptions? When we can work out the script automatically through Regex?)

That's why working out whether to apply CJK punctuation contextually, rather than based solely on a language tag, has ALWAYS been the right way to proceed, and I am proceeding with it.

Rereading, the default is indeed going to be CJK, but it will be overridden when the immediate context shows that full-width punctuation makes no sense (the surrounding characters are Latin). And I simply cannot trust users to exhaustively mark up references (let alone individual bits of references) to indicate language explicitly.

opoudjis · 2024-10-24T02:10:23Z

@opoudjis the Japanese "middle dot" delimiter is not the "full stop", they are different symbols.

As I have just acknowledged, which is why I am doing the refactoring.

opoudjis · 2024-10-24T09:54:52Z

From a.presentation.xml: 1第1版

You're looking at the wrong file: I am generating

<edition language="">1</edition><edition language="ja">第一版</edition>

in the Japanese numbering version. You'll have a refresh soon.

…se numbering in Japanese dates: #228

opoudjis · 2024-10-24T10:27:39Z

ordered list items

This is an update to JIS. JIS has Alphabetic numbering on its first level of ordered lists, and Arabic numbering on subsequent levels. I don't know what the provenance of the PDF sample is, and I do not care: I am not overriding JIS list numbering for some unasked-for proof of concept. I am implementing Japanese numbering to replace Arabic numbering in ordered lists ONLY where JIS sanctions that.

opoudjis · 2024-10-24T11:40:06Z

As warned: HTML right now has no idea what to do with custom list labels.

@Intelligent2013 The following should have now everything you need for this proof of concept.

Archive.zip

Intelligent2013 · 2024-10-24T11:40:57Z

You're looking at the wrong file: I am generating
<edition language="">1</edition><edition language="ja">第一版</edition>
in the Japanese numbering version. You'll have a refresh soon.

Ok. please note I need just 一 without 第 版 around it. And we need to keep the value 第1版 for current (not-vertical) layout.
I.e. like this <edition language="">1</edition><edition language="ja">第1版</edition><edition language="ja" numberonly="true">一</edition>.

opoudjis · 2024-10-24T12:57:03Z

Yuck, that's really adhoc. OK...

opoudjis · 2024-10-24T13:06:10Z

@Intelligent2013 Here you go.

Archive 2.zip

Intelligent2013 · 2024-10-24T13:25:10Z

Ordered lists look ok:

Thanks!

Now, testing edition number....

Intelligent2013 · 2024-10-24T13:39:21Z

@opoudjis the edition number is ok also. Thanks!

I've updated the initial post for notes, examples numbers:

Note: I don't know the reason, but the notes numbers should be Arabic:

@Intelligent2013 I just noticed this since @opoudjis raised it. They are meant to be in Japanese numerals too.

opoudjis · 2024-10-25T00:58:52Z

@opoudjis the edition number is ok also. Thanks!

I've updated the initial post for notes, examples numbers:

Note: I don't know the reason, but the notes numbers should be Arabic:

@Intelligent2013 I just noticed this since @opoudjis raised it. They are meant to be in Japanese numerals too.

I will not be actioning this at this time, because I need evidence that clients actually want this behaviour, and I am reasonably sure they won't be consistent about it.

opoudjis · 2024-10-25T13:38:59Z

So, rather than get into a protracted discussion:

I am closing this ticket as complete.

The additional requirement stated for custom numbering of notes, examples, requirements, formulas, term notes, term examples, annexes, admonitions, ordered lists (as distinct from list items), definition lists, figures, subfigures, tables, could be satisfied in one of two ways:

Blanket Japanese numbering of all of them. That is not what the PDF document is doing, and without a written statement from a Japanese client saying that is what they want, I will not implement it, and the assumption that we can ignore what agencies have actually done in their editorial practice is unacceptable.
Customisation of all fourteen classes of counter, because we never know which preference any particular agency is going to go with, and we have no reason to think there is any consistency between them.

The second approach is the only respectful way to engage with customers. It is also 200-300 lines of code for what is, at this stage, a proof of concept that nobody external has actually asked for, and that no external agency is exercising QA over.

It is therefore not going to be a priority for me to work on until some agency actually does ask for it, and can articulate authoritatively how whether they want each of their notes, examples, requirements, formulas, term notes, term examples, annexes, admonitions, ordered lists (as distinct from list items), definition lists, figures, subfigures, tables to be numbered Japanese or Arabic.

I will create a ticket for this, and I will demote it to medium priority.

Intelligent2013 added the enhancement New feature or request label Oct 20, 2024

Intelligent2013 assigned opoudjis Oct 20, 2024

Intelligent2013 added this to Metanorma Oct 20, 2024

github-project-automation bot moved this to 🆕 New in Metanorma Oct 20, 2024

Intelligent2013 mentioned this issue Oct 20, 2024

PDF: Add new vertical layout #226

Open

24 tasks

opoudjis added a commit to metanorma/isodoc that referenced this issue Oct 22, 2024

refactor xref counter to allow overriding of number style: metanorma/…

e3fbe68

…metanorma-jis#228

opoudjis added a commit to metanorma/isodoc that referenced this issue Oct 22, 2024

subclass ClauseCounter, to allow it to be overridden in flavours: met…

c1cdd9c

…anorma/metanorma-jis#228

opoudjis added a commit to metanorma/isodoc that referenced this issue Oct 22, 2024

Clause counter class: metanorma/metanorma-jis#228

a9d0398

opoudjis added a commit to metanorma/isodoc that referenced this issue Oct 22, 2024

clause counter method: metanorma/metanorma-jis#228

c28d58a

opoudjis added a commit to metanorma/isodoc that referenced this issue Oct 22, 2024

clause counter method: metanorma/metanorma-jis#228

c33d81e

opoudjis added a commit that referenced this issue Oct 22, 2024

Japanese autonumbering: #228

89315d2

opoudjis added a commit to metanorma/isodoc that referenced this issue Oct 23, 2024

Japanese numbering in xrefs; l10n of all xref labels: metanorma/metan…

35bd101

…orma-jis#228

opoudjis added a commit to metanorma/isodoc that referenced this issue Oct 23, 2024

Japanese numbering in xrefs; l10n of all xref labels: metanorma/metan…

870afdf

…orma-jis#228

opoudjis added a commit to metanorma/isodoc that referenced this issue Oct 24, 2024

configure separator on xref counter: metanorma/metanorma-jis#228

a009bbc

opoudjis added a commit that referenced this issue Oct 24, 2024

customise xref clause number separator for Japanese numbering; Japane…

18312e9

…se numbering in Japanese dates: #228

opoudjis added a commit to metanorma/isodoc that referenced this issue Oct 24, 2024

list counter type: metanorma/metanorma-jis#228

a48f6e9

opoudjis added a commit to metanorma/isodoc that referenced this issue Oct 24, 2024

list counter type: metanorma/metanorma-jis#228

03fd5a2

opoudjis added a commit to metanorma/isodoc that referenced this issue Oct 24, 2024

list counter type: metanorma/metanorma-jis#228

335fcb3

opoudjis added a commit to metanorma/isodoc that referenced this issue Oct 24, 2024

list counter type: metanorma/metanorma-jis#228

c247444

opoudjis added a commit that referenced this issue Oct 24, 2024

list numbering in Japanese numbers: #228

27a47c2

opoudjis added a commit that referenced this issue Oct 24, 2024

number-only edition i18n: #228

c293a32

opoudjis moved this from 🏗 In progress to 👀 In review in Metanorma Oct 25, 2024

opoudjis closed this as completed Oct 25, 2024

github-project-automation bot moved this from 👀 In review to ✅ Done in Metanorma Oct 25, 2024

opoudjis mentioned this issue Oct 25, 2024

Support custom numbering style for different assets #234

Open

opoudjis added a commit to metanorma/metanorma.org that referenced this issue Oct 25, 2024

Japanese auto-numbering: metanorma/metanorma-jis#228

b905b87

Intelligent2013 mentioned this issue Nov 30, 2024

Presentation XML caption refactoring metanorma/mn-native-pdf#770

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Japanese numerals #228

Support Japanese numerals #228

Intelligent2013 commented Oct 20, 2024 •

edited

Loading

ReesePlews commented Oct 21, 2024

ronaldtse commented Oct 21, 2024 •

edited

Loading

Intelligent2013 commented Oct 21, 2024

opoudjis commented Oct 22, 2024 •

edited

Loading

ronaldtse commented Oct 22, 2024 •

edited

Loading

opoudjis commented Oct 22, 2024

ronaldtse commented Oct 22, 2024

opoudjis commented Oct 22, 2024 •

edited

Loading

opoudjis commented Oct 22, 2024 •

edited

Loading

opoudjis commented Oct 22, 2024 •

edited

Loading

opoudjis commented Oct 22, 2024

opoudjis commented Oct 22, 2024 •

edited

Loading

opoudjis commented Oct 22, 2024

Intelligent2013 commented Oct 22, 2024

opoudjis commented Oct 23, 2024 •

edited

Loading

Intelligent2013 commented Oct 23, 2024

opoudjis commented Oct 23, 2024

opoudjis commented Oct 24, 2024

opoudjis commented Oct 24, 2024 •

edited

Loading

ronaldtse commented Oct 24, 2024

ronaldtse commented Oct 24, 2024 •

edited

Loading

opoudjis commented Oct 24, 2024 •

edited

Loading

opoudjis commented Oct 24, 2024

opoudjis commented Oct 24, 2024

opoudjis commented Oct 24, 2024

opoudjis commented Oct 24, 2024

Intelligent2013 commented Oct 24, 2024

opoudjis commented Oct 24, 2024

opoudjis commented Oct 24, 2024

Intelligent2013 commented Oct 24, 2024

Intelligent2013 commented Oct 24, 2024

opoudjis commented Oct 25, 2024

opoudjis commented Oct 25, 2024

Support Japanese numerals #228

Support Japanese numerals #228

Comments

Intelligent2013 commented Oct 20, 2024 • edited Loading

ReesePlews commented Oct 21, 2024

ronaldtse commented Oct 21, 2024 • edited Loading

Intelligent2013 commented Oct 21, 2024

opoudjis commented Oct 22, 2024 • edited Loading

ronaldtse commented Oct 22, 2024 • edited Loading

opoudjis commented Oct 22, 2024

ronaldtse commented Oct 22, 2024

opoudjis commented Oct 22, 2024 • edited Loading

opoudjis commented Oct 22, 2024 • edited Loading

opoudjis commented Oct 22, 2024 • edited Loading

opoudjis commented Oct 22, 2024

opoudjis commented Oct 22, 2024 • edited Loading

opoudjis commented Oct 22, 2024

Intelligent2013 commented Oct 22, 2024

opoudjis commented Oct 23, 2024 • edited Loading

Intelligent2013 commented Oct 23, 2024

opoudjis commented Oct 23, 2024

opoudjis commented Oct 24, 2024

opoudjis commented Oct 24, 2024 • edited Loading

ronaldtse commented Oct 24, 2024

ronaldtse commented Oct 24, 2024 • edited Loading

opoudjis commented Oct 24, 2024 • edited Loading

opoudjis commented Oct 24, 2024

opoudjis commented Oct 24, 2024

opoudjis commented Oct 24, 2024

opoudjis commented Oct 24, 2024

Intelligent2013 commented Oct 24, 2024

opoudjis commented Oct 24, 2024

opoudjis commented Oct 24, 2024

Intelligent2013 commented Oct 24, 2024

Intelligent2013 commented Oct 24, 2024

opoudjis commented Oct 25, 2024

opoudjis commented Oct 25, 2024

Intelligent2013 commented Oct 20, 2024 •

edited

Loading

ronaldtse commented Oct 21, 2024 •

edited

Loading

opoudjis commented Oct 22, 2024 •

edited

Loading

ronaldtse commented Oct 22, 2024 •

edited

Loading

opoudjis commented Oct 22, 2024 •

edited

Loading

opoudjis commented Oct 22, 2024 •

edited

Loading

opoudjis commented Oct 22, 2024 •

edited

Loading

opoudjis commented Oct 22, 2024 •

edited

Loading

opoudjis commented Oct 23, 2024 •

edited

Loading

opoudjis commented Oct 24, 2024 •

edited

Loading

ronaldtse commented Oct 24, 2024 •

edited

Loading

opoudjis commented Oct 24, 2024 •

edited

Loading