A repo for a Digital Edition of Amadu Kurubari's "History of Samori Toure" from Delafosse's 1901 Jula Grammar (see this blog post for more context).
-
Write scripts to automatically manipulate and change the original OCR text into a critical markdown format
Remove the page headers and page numbers and make them into markdown headersExtract footnotes from prose- Clean up and match the semi-automated footnote markers and text.
-
Figure out auto-replacements for the modern version of the text
- <yè> for
yɛ
- for intervocalic
g
- for
ani
'and' - <-ru> for plural
- <kù-tigi> for
kuntigi
- for
siya
'many' - for
ele
- for
n f
- for
fò
- for
j
- as part of
siyaman
(It can besya-ma
orsya-mà
etc.) - for
c
- for
ɲ
- <lô> for
lɔ́n
- <kyè> for
cɛ
- <-ra> for
???
- <yè> for
- Started with the
ocr.txt
file. - Separated the French language introduction (
intro.txt
) from the text proper (text.md
) - Added markdown page number headers (e.g.,
### 149
) using the scriptpages.py
- Removed original document headers and page numbers that were caught in the text (semi-manually using search and replace in an editor)
- Partially automated the conversion of Delafosse's footnotes into markdown footnotes using
footnotes.py