-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Syriac to Reader #6
base: master
Are you sure you want to change the base?
Changes from all commits
03353e3
903c492
5d732a3
f4d9b0f
6ddf145
fa25f22
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,6 @@ | ||
*.aux | ||
*.log | ||
*.tex | ||
|
||
!pre.tex | ||
!post.tex | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,14 @@ | ||
FROM debian:buster-slim | ||
|
||
RUN mkdir -p /usr/src/app | ||
COPY *.md *.py *.tex *.html /usr/src/app/ | ||
COPY texlive.profile /tmp/texlive.profile | ||
|
||
ENV PATH="/usr/local/texlive/2020/bin/x86_64-linux:${PATH}" | ||
WORKDIR /usr/src/app | ||
|
||
RUN export INSTALL_PACKAGES="build-essential curl fontconfig perl python3-dev python3-setuptools subversion" &&\ | ||
export PACKAGES="libffi-dev libfontconfig1 python3 tar" &&\ | ||
apt-get update -qq &&\ | ||
RUN apt-get update -qq &&\ | ||
apt-get upgrade -qq &&\ | ||
apt-get install -qq $INSTALL_PACKAGES $PACKAGES &&\ | ||
apt-get install -qq build-essential curl fontconfig perl python3-dev python3-setuptools subversion libffi-dev libfontconfig1 python3 tar &&\ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reason all this is done is that the final image is smaller. The You are right that this makes development a bit slow; I usually change the Dockerfile locally but don't commit the changes. Nowadays it is also possible to create a multi-stage build, so you could have a look if you can get that working. If you don't want to spend time on it, please just undo the changes here. |
||
svn export https://github.com/ETCBC/bhsa/trunk/tf/c /bhsa/c &&\ | ||
cd /tmp &&\ | ||
curl -L http://mirrors.ctan.org/systems/texlive/tlnet/install-tl-unx.tar.gz | tar xz &&\ | ||
|
@@ -38,14 +35,28 @@ RUN export INSTALL_PACKAGES="build-essential curl fontconfig perl python3-dev py | |
zref &&\ | ||
mktexfmt xelatex.fmt &&\ | ||
curl -H 'User-Agent: stop checking' -L http://www.sbl-site.org/Fonts/SBL_Hbrw.ttf > /usr/local/share/fonts/SBL_Hbrw.ttf &&\ | ||
fc-cache &&\ | ||
cd /usr/src/app &&\ | ||
python3 setup.py install &&\ | ||
mkdir data &&\ | ||
./collectcontexts.py --bhsa /bhsa --module c &&\ | ||
apt-get remove -qq $INSTALL_PACKAGES &&\ | ||
apt-get -qq autoremove &&\ | ||
rm -rf /var/lib/apt/lists/* | ||
fc-cache | ||
|
||
RUN svn export https://github.com/ETCBC/peshitta/trunk/tf/0.2 /syriac/c | ||
RUN svn export https://github.com/CenterBLC/LXX/trunk/tf/1935 /greek/c | ||
|
||
COPY setup.py README.md /usr/src/app/ | ||
|
||
RUN cd /usr/src/app && python3 setup.py install | ||
RUN cd /usr/src/app && mkdir data | ||
|
||
COPY hebrewreader.py hebrewreaderserver.py collectcontexts.py minitf.py *.html /usr/src/app/ | ||
RUN cd /usr/src/app && ./collectcontexts.py --bhsa /bhsa --module c --lang hebrew | ||
RUN cd /usr/src/app && ./collectcontexts.py --bhsa /syriac --module c --lang syriac | ||
RUN cd /usr/src/app && ./collectcontexts.py --bhsa /greek --module c --lang greek | ||
|
||
|
||
COPY NotoSansSyriac-Regular.ttf /usr/local/share/fonts/syriac_2.ttf | ||
COPY NotoSerif-Regular.ttf /usr/local/share/fonts/greek.ttf | ||
Comment on lines
+54
to
+55
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A couple of things:
|
||
|
||
RUN fc-cache -f && rm -rf /var/cache/* | ||
|
||
COPY *.tex /usr/src/app/ | ||
|
||
ENTRYPOINT ["./hebrewreaderserver.py"] | ||
CMD [] | ||
CMD [] |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,22 +5,22 @@ | |
|
||
from tf.fabric import Fabric | ||
|
||
from hebrewreader import DATADIR, FEATURES, load_data | ||
from hebrewreader import DATADIR, FEATURES, GRK_FEATURES, SYR_FEATURES, load_data | ||
from minitf import gather_context | ||
|
||
VERSE_NODES = dict() | ||
|
||
def gather_chapter(api, book, chap): | ||
def gather_chapter(api, book, chap, lang): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please make |
||
global VERSE_NODES | ||
nodes = set() | ||
node = api.T.nodeFromSection((book, chap, 1)) | ||
if node is None: | ||
return None | ||
verse = 1 | ||
VERSE_NODES[book][chap] = dict() | ||
VERSE_NODES[lang][book][chap] = dict() | ||
while api.T.sectionFromNode(node)[0:2] == (book,chap): | ||
verse = api.T.sectionFromNode(node)[2] | ||
VERSE_NODES[book][chap][verse] = node | ||
VERSE_NODES[lang][book][chap][verse] = node | ||
nodes.add(node) | ||
words = api.L.d(node, 'word') | ||
nodes.update(set(words)) | ||
|
@@ -32,40 +32,70 @@ def gather_chapter(api, book, chap): | |
node = next_verse[0] | ||
return nodes | ||
|
||
def gather_book(api, book): | ||
def gather_book(api, book, lang): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please swap |
||
global VERSE_NODES | ||
result = dict() | ||
chap = 1 | ||
VERSE_NODES[book] = dict() | ||
VERSE_NODES[lang][book] = dict() | ||
while True: | ||
nodes = gather_chapter(api, book, chap) | ||
nodes = gather_chapter(api, book, chap, lang) | ||
if nodes is None: | ||
return result | ||
result[chap] = nodes | ||
chap += 1 | ||
|
||
def dump_book(api, book): | ||
nodesets = gather_book(api, book) | ||
def dump_book(api, book, lang, use_features): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
nodesets = gather_book(api, book, lang) | ||
for chap, nodes in nodesets.items(): | ||
context = gather_context( | ||
api, | ||
{'features': FEATURES, 'locality': 'udnp'}, | ||
{'features': use_features, 'locality': 'udnp'}, | ||
(nodes,)) | ||
fname = book + '_' + str(chap) + '.pkl' | ||
fname = lang + '_' + book + '_' + str(chap) + '.pkl' | ||
with open(os.path.join(DATADIR, fname), 'wb') as f: | ||
pickle.dump(context, f) | ||
|
||
def gather(locations, modules): | ||
def gather(locations, modules, lang): | ||
global VERSE_NODES | ||
TF = Fabric(locations=locations, modules=modules, silent=True) | ||
api = TF.load(FEATURES, silent=True) | ||
if lang[0] == 'syriac': | ||
use_features = SYR_FEATURES | ||
elif lang[0] == 'greek': | ||
use_features = GRK_FEATURES | ||
else: | ||
use_features = FEATURES | ||
api = TF.load(use_features, silent=True) | ||
|
||
VERSE_NODES[lang[0]] = {} | ||
|
||
for node in api.F.otype.s('book'): | ||
book = api.T.sectionFromNode(node)[0] | ||
print(book) | ||
dump_book(api, book) | ||
dump_book(api, book, lang[0], use_features) | ||
|
||
if lang[0] == 'hebrew': | ||
with open(os.path.join(DATADIR, 'verse_nodes.pkl'), 'wb') as f: | ||
pickle.dump(VERSE_NODES, f) | ||
elif lang[0] == 'syriac': | ||
with open(os.path.join(DATADIR, 'verse_nodes.pkl'), 'rb') as f: | ||
HEB_VERSE_NODES = pickle.load(f) | ||
FIN_VERSE_NODES = { | ||
"hebrew": HEB_VERSE_NODES['hebrew'], | ||
"syriac": VERSE_NODES['syriac'] | ||
} | ||
|
||
with open(os.path.join(DATADIR, 'verse_nodes.pkl'), 'wb') as f: | ||
pickle.dump(VERSE_NODES, f) | ||
with open(os.path.join(DATADIR, 'verse_nodes.pkl'), 'wb') as f: | ||
pickle.dump(FIN_VERSE_NODES, f) | ||
elif lang[0] == 'greek': | ||
with open(os.path.join(DATADIR, 'verse_nodes.pkl'), 'rb') as f: | ||
PREV_VERSE_NODES = pickle.load(f) | ||
FIN_VERSE_NODES = { | ||
"hebrew": PREV_VERSE_NODES['hebrew'], | ||
"syriac": PREV_VERSE_NODES['syriac'], | ||
"greek": VERSE_NODES['greek'] | ||
} | ||
|
||
with open(os.path.join(DATADIR, 'verse_nodes.pkl'), 'wb') as f: | ||
pickle.dump(FIN_VERSE_NODES, f) | ||
Comment on lines
+75
to
+98
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's good to do this in multiple steps (perhaps a user wants to use the program only for one data set), but now the program assumes that the data file is generated in a specific order: Hebrew > Syriac > Greek. Can we not just use separate files for each data set? |
||
|
||
def main(): | ||
parser = ArgumentParser(description='Gather the TF contexts to reduce memory usage in the HTTP server') | ||
|
@@ -75,10 +105,11 @@ def main(): | |
help='Location of the BHSA data') | ||
p_data.add_argument('--module', '-m', nargs=1, required=True, | ||
help='Text-fabric module to load') | ||
p_data.add_argument('--lang', '-l', nargs=1, required=True, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you leave out |
||
help='Which language data to load') | ||
|
||
args = parser.parse_args() | ||
|
||
gather(args.bhsa, args.module) | ||
gather(args.bhsa, args.module, args.lang) | ||
|
||
if __name__ == '__main__': | ||
main() |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
\documentclass[a4paper]{article} | ||
|
||
\newif\iflargetext | ||
\newif\iflargertext | ||
|
||
\usepackage[margin=18mm,bottom=25mm]{geometry} | ||
\usepackage{setspace} | ||
\usepackage{relsize} | ||
\usepackage{multicol} | ||
|
||
\makeatletter | ||
\chardef\l@hebrew=255 | ||
\makeatother | ||
\usepackage{polyglossia} | ||
\setmainlanguage{english} | ||
\setotherlanguage{hebrew} | ||
\newfontfamily\hebrewfont[Scale=MatchUppercase,Script=Hebrew]{SBL Hebrew} | ||
\setotherlanguage{syriac} | ||
\newfontfamily\syriacfont[Scale=MatchUppercase]{Noto Sans Syriac} | ||
\setotherlanguage{greek} | ||
\newfontfamily\greekfont[Scale=MatchUppercase]{Noto Serif} | ||
\title{Biblical Hebrew Reader} | ||
\author{HebrewTools} | ||
|
||
\newcommand{\rdrchap}[1]{\begin{english}\small\textbf{#1}\end{english}} | ||
\newcommand{\rdrverse}[1]{\raisebox{2.5pt}{\smaller[4]#1}} | ||
|
||
\newcommand{\setuma}{~~~~{\scriptsize ס}~~~~} | ||
\newcommand{\petucha}{~~~~{\footnotesize פ}~~~~} | ||
|
||
\begin{document} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be ignored, because the program generates these, and files like
pre.tex
should be excluded with a!
line immediately below.