Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indication of vowel length interfering with xml parsing #1

Open
wmshort opened this issue Dec 8, 2018 · 5 comments
Open

Indication of vowel length interfering with xml parsing #1

wmshort opened this issue Dec 8, 2018 · 5 comments

Comments

@wmshort
Copy link

wmshort commented Dec 8, 2018

The coding scheme for vowel length -- following " for long-vowels and following ' for short vowels, where this indicates etymological an difference -- is interfering with XML parsing. I would suggest a different coding scheme entirely (e.g., following : for long vowels) or, where possible, relying on already-included morphological information for differentiation?

@gfranzini
Copy link
Contributor

Which data are you trying to parse? We'll be converting them to long and short accent marks at some point.

@wmshort
Copy link
Author

wmshort commented Dec 10, 2018 via email

@passarom
Copy link
Member

passarom commented Dec 10, 2018 via email

@gersh0m
Copy link
Member

gersh0m commented Dec 10, 2018

The problem is caused by a sloppy/buggy output of LemLat.
You don't need any UNICODE in here!
You just need to use the corresponding codes for (double)quote as basic XML syntax states...

@wmshort
Copy link
Author

wmshort commented Dec 10, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants