Add Syriac to Reader #6

anlinguist · 2022-10-10T22:14:03Z

This PR adds Syriac Peshitta to the Reader Generator. Unfortunately, the Syriac TF data doesn't contain the gloss attribute, so there's not yet a way to add a vocabulary.

Also, I've modified the Dockerfile a bit to make rebuilding the container faster by separating the setup stuff from the file data so that when file data is changed, the container can use cached layers for setup and only load in the changed files.

anlinguist · 2022-10-10T22:20:51Z

anlinguist · 2022-10-10T22:22:52Z

I would have liked to make each text configurable for Hebrew vs Syriac, to allow having texts of both languages in the same PDF, maybe a future PR.

camilstaps

Thanks so much! 😊 This is something I never thought about, but I think it will be useful for many people.

I have mostly some requests to keep the code as I would expect it to be, so that it is easier to maintain for me. There are 23 comments, but many are very small and the same. When this is done, I will refactor the tex templates. They contain some duplicate code now, which can be reduced by setting the language as a macro.

Unless you prefer I don't, I can add your name as a contributor to the readme and index.html.

If you ever want to add the option to add passages in different languages, we should first refactor the UI. Please sync with me first then, as I already have some ideas for this.

camilstaps · 2022-10-12T07:51:02Z

.gitignore

@@ -1,7 +1,6 @@
 *.aux
 *.log
 *.pdf
-*.tex


This should be ignored, because the program generates these, and files like pre.tex should be excluded with a ! line immediately below.

camilstaps · 2022-10-12T07:56:16Z

Dockerfile

 	apt-get upgrade -qq &&\
-	apt-get install -qq $INSTALL_PACKAGES $PACKAGES &&\
+	apt-get install -qq build-essential curl fontconfig perl python3-dev python3-setuptools subversion libffi-dev libfontconfig1 python3 tar &&\


The reason all this is done is that the final image is smaller. The apt-get install and apt-get remove && rm -rf /var/lib/apt/lists/* should be in the same RUN command, so that the packages that are only needed temporarily are not stored in the Docker cache.

You are right that this makes development a bit slow; I usually change the Dockerfile locally but don't commit the changes. Nowadays it is also possible to create a multi-stage build, so you could have a look if you can get that working. If you don't want to spend time on it, please just undo the changes here.

camilstaps · 2022-10-12T08:08:52Z

Dockerfile

+COPY NotoSansSyriac-Regular.ttf /usr/local/share/fonts/syriac_2.ttf
+COPY NotoSerif-Regular.ttf /usr/local/share/fonts/greek.ttf


A couple of things:

If possible I would prefer to download the fonts (as is done with the Hebrew one); this makes sure that we always have the latest version.

Please keep original font names (i.e. NotoSansSyriac-Regular.ttf instead of syriac_2.ttf).

Does Noto Serif Regular have all the required glyphs? If I enter the LXX of Gen 1:1 on https://fonts.google.com/noto/specimen/Noto+Serif/tester I see empty boxes for the first letter of the first and third word. Why not use SBL Greek, as we are using SBL Hebrew already?

At the bottom of index.html there is a note about the fonts that are being used; please add these.

camilstaps · 2022-10-12T08:09:42Z

collectcontexts.py

 from minitf import gather_context

 VERSE_NODES = dict()

-def gather_chapter(api, book, chap):
+def gather_chapter(api, book, chap, lang):


Please make lang the second argument, so that the order matches the one in the VERSE_NODES hierarchy.

camilstaps · 2022-10-12T08:09:59Z

collectcontexts.py

@@ -32,40 +32,70 @@ def gather_chapter(api, book, chap):
        node = next_verse[0]
    return nodes

-def gather_book(api, book):
+def gather_book(api, book, lang):


Please swap book and lang.

camilstaps · 2022-10-12T08:25:19Z

hebrewreader.py

-        tex.write('\n\n' + templates['pretext'])
+        if lang == "hebrew":
+            tex.write('\n\n' + templates['pretext'])
+        elif lang == "syriac":
+            tex.write('\n\n' + templates['pretext_syr'])
+        elif lang == "greek":
+            tex.write('\n\n' + templates['pretext_grk'])
        tex.write('\n'.join(text))
-        tex.write('\n' + templates['posttext'])
+        if lang == "hebrew":
+            tex.write('\n' + templates['posttext'])
+        elif lang == "syriac":
+            tex.write('\n' + templates['posttext_syr'])
+        elif lang == "greek":
+            tex.write('\n' + templates['posttext_grk'])


If you rename the files to pretext_LANGUAGE.tex, you can simply use templates['pretext_' + lang] here.

camilstaps · 2022-10-12T08:25:44Z

hebrewreader.py

-        try:
-            api = load_data(passage)
-            text, words = get_passage_and_words(passage, api)
-        except:
-            raise ValueError('Could not find reference "{}"'.format(passage_text))
+        api = load_data(passage, lang)
+        text, words = get_passage_and_words(passage, api, lang)


Why this change?

camilstaps · 2022-10-12T08:26:27Z

hebrewreader.py

@@ -95,17 +93,17 @@ def fix_gloss(gloss):
        return 'I'
    return re.sub(r'<(.*)>', r'\\textit{\1}', gloss)

-def get_passage_and_words(passage, api, separate_chapters=True, verse_nos=True):
+def get_passage_and_words(passage, api, lang, separate_chapters=True, verse_nos=True):


Could you make the order lang, api, passage? (I see now that the old order is incorrect, too.)

camilstaps · 2022-10-12T08:30:07Z

hebrewreaderserver.py

                generate(passages,
                        include_voca is not None and include_voca,
                        combine_voca is not None and combine_voca,
                        clearpage_before_voca is not None and clearpage_before_voca,
                        text_size is not None and int(text_size[0]) > 0,
                        text_size is not None and int(text_size[0]) > 1,
                        tex, None if fmt == 'tex' else pdf,
-                        TEMPLATES, quiet=True)
+                        TEMPLATES, lang, quiet=False)


Please use lang[0] here (it is an artifact of the server interface that we get a list with one element, and generate should be agnostic to the interface).

camilstaps · 2022-10-12T08:32:48Z

hebrewreader.py

@@ -300,7 +329,7 @@ def main():
        templates['postvoca'] = args.post_voca_tex.read()
        generate(args.passages,
                args.include_voca, args.combine_voca, args.clearpage_before_voca,
-                args.large_text, False, args.tex, args.pdf, templates)
+                args.large_text, False, args.tex, args.pdf, args.lang, templates)


Does this work? The p_misc parser did not get a lang option, so I would expect args.lang to always be None. I would expect to be able to do something like (cf the readme):

./hebrewreader.py --lang greek --pdf genesis.pdf Genesis

anlinguist added 2 commits October 10, 2022 15:53

add syriac

03353e3

fix syriac font

903c492

anlinguist added 4 commits October 10, 2022 20:52

add Greek

5d732a3

add syr and grk tex files

f4d9b0f

disable RL only on Greek

6ddf145

increase timeout to 30s

fa25f22

camilstaps added the enhancement New feature or request label Oct 12, 2022

camilstaps requested changes Oct 12, 2022

View reviewed changes

camilstaps assigned anlinguist Oct 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Syriac to Reader #6

Add Syriac to Reader #6

anlinguist commented Oct 10, 2022

anlinguist commented Oct 10, 2022

anlinguist commented Oct 10, 2022

camilstaps left a comment

camilstaps Oct 12, 2022

camilstaps Oct 12, 2022

camilstaps Oct 12, 2022

camilstaps Oct 12, 2022

camilstaps Oct 12, 2022

camilstaps Oct 12, 2022

camilstaps Oct 12, 2022

camilstaps Oct 12, 2022

camilstaps Oct 12, 2022

camilstaps Oct 12, 2022

@@ @@ -1,7 +1,6 @@ @@
               *.aux
               *.log
               *.pdf
-              *.tex

		COPY NotoSansSyriac-Regular.ttf /usr/local/share/fonts/syriac_2.ttf
		COPY NotoSerif-Regular.ttf /usr/local/share/fonts/greek.ttf

Add Syriac to Reader #6

Are you sure you want to change the base?

Add Syriac to Reader #6

Conversation

anlinguist commented Oct 10, 2022

anlinguist commented Oct 10, 2022

anlinguist commented Oct 10, 2022

camilstaps left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment