Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Syriac to Reader #6

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

anlinguist
Copy link

This PR adds Syriac Peshitta to the Reader Generator. Unfortunately, the Syriac TF data doesn't contain the gloss attribute, so there's not yet a way to add a vocabulary.

Also, I've modified the Dockerfile a bit to make rebuilding the container faster by separating the setup stuff from the file data so that when file data is changed, the container can use cached layers for setup and only load in the changed files.

@anlinguist
Copy link
Author

Screen Shot 2022-10-10 at 4 20 04 PM

Screen Shot 2022-10-10 at 4 19 17 PM

@anlinguist
Copy link
Author

I would have liked to make each text configurable for Hebrew vs Syriac, to allow having texts of both languages in the same PDF, maybe a future PR.

@camilstaps camilstaps added the enhancement New feature or request label Oct 12, 2022
Copy link
Member

@camilstaps camilstaps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much! 😊 This is something I never thought about, but I think it will be useful for many people.

I have mostly some requests to keep the code as I would expect it to be, so that it is easier to maintain for me. There are 23 comments, but many are very small and the same. When this is done, I will refactor the tex templates. They contain some duplicate code now, which can be reduced by setting the language as a macro.

Unless you prefer I don't, I can add your name as a contributor to the readme and index.html.

If you ever want to add the option to add passages in different languages, we should first refactor the UI. Please sync with me first then, as I already have some ideas for this.

@@ -1,7 +1,6 @@
*.aux
*.log
*.pdf
*.tex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be ignored, because the program generates these, and files like pre.tex should be excluded with a ! line immediately below.

apt-get upgrade -qq &&\
apt-get install -qq $INSTALL_PACKAGES $PACKAGES &&\
apt-get install -qq build-essential curl fontconfig perl python3-dev python3-setuptools subversion libffi-dev libfontconfig1 python3 tar &&\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason all this is done is that the final image is smaller. The apt-get install and apt-get remove && rm -rf /var/lib/apt/lists/* should be in the same RUN command, so that the packages that are only needed temporarily are not stored in the Docker cache.

You are right that this makes development a bit slow; I usually change the Dockerfile locally but don't commit the changes. Nowadays it is also possible to create a multi-stage build, so you could have a look if you can get that working. If you don't want to spend time on it, please just undo the changes here.

Comment on lines +54 to +55
COPY NotoSansSyriac-Regular.ttf /usr/local/share/fonts/syriac_2.ttf
COPY NotoSerif-Regular.ttf /usr/local/share/fonts/greek.ttf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of things:

  • If possible I would prefer to download the fonts (as is done with the Hebrew one); this makes sure that we always have the latest version.
  • Please keep original font names (i.e. NotoSansSyriac-Regular.ttf instead of syriac_2.ttf).
  • Does Noto Serif Regular have all the required glyphs? If I enter the LXX of Gen 1:1 on https://fonts.google.com/noto/specimen/Noto+Serif/tester I see empty boxes for the first letter of the first and third word. Why not use SBL Greek, as we are using SBL Hebrew already?
  • At the bottom of index.html there is a note about the fonts that are being used; please add these.

from minitf import gather_context

VERSE_NODES = dict()

def gather_chapter(api, book, chap):
def gather_chapter(api, book, chap, lang):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make lang the second argument, so that the order matches the one in the VERSE_NODES hierarchy.

@@ -32,40 +32,70 @@ def gather_chapter(api, book, chap):
node = next_verse[0]
return nodes

def gather_book(api, book):
def gather_book(api, book, lang):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please swap book and lang.

Comment on lines -176 to +197
tex.write('\n\n' + templates['pretext'])
if lang == "hebrew":
tex.write('\n\n' + templates['pretext'])
elif lang == "syriac":
tex.write('\n\n' + templates['pretext_syr'])
elif lang == "greek":
tex.write('\n\n' + templates['pretext_grk'])
tex.write('\n'.join(text))
tex.write('\n' + templates['posttext'])
if lang == "hebrew":
tex.write('\n' + templates['posttext'])
elif lang == "syriac":
tex.write('\n' + templates['posttext_syr'])
elif lang == "greek":
tex.write('\n' + templates['posttext_grk'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you rename the files to pretext_LANGUAGE.tex, you can simply use templates['pretext_' + lang] here.

Comment on lines -170 to +183
try:
api = load_data(passage)
text, words = get_passage_and_words(passage, api)
except:
raise ValueError('Could not find reference "{}"'.format(passage_text))
api = load_data(passage, lang)
text, words = get_passage_and_words(passage, api, lang)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

@@ -95,17 +93,17 @@ def fix_gloss(gloss):
return 'I'
return re.sub(r'<(.*)>', r'\\textit{\1}', gloss)

def get_passage_and_words(passage, api, separate_chapters=True, verse_nos=True):
def get_passage_and_words(passage, api, lang, separate_chapters=True, verse_nos=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make the order lang, api, passage? (I see now that the old order is incorrect, too.)

generate(passages,
include_voca is not None and include_voca,
combine_voca is not None and combine_voca,
clearpage_before_voca is not None and clearpage_before_voca,
text_size is not None and int(text_size[0]) > 0,
text_size is not None and int(text_size[0]) > 1,
tex, None if fmt == 'tex' else pdf,
TEMPLATES, quiet=True)
TEMPLATES, lang, quiet=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use lang[0] here (it is an artifact of the server interface that we get a list with one element, and generate should be agnostic to the interface).

@@ -300,7 +329,7 @@ def main():
templates['postvoca'] = args.post_voca_tex.read()
generate(args.passages,
args.include_voca, args.combine_voca, args.clearpage_before_voca,
args.large_text, False, args.tex, args.pdf, templates)
args.large_text, False, args.tex, args.pdf, args.lang, templates)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work? The p_misc parser did not get a lang option, so I would expect args.lang to always be None. I would expect to be able to do something like (cf the readme):

./hebrewreader.py --lang greek --pdf genesis.pdf Genesis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants