Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge in daide2eng #14

Merged
merged 10 commits into from
Nov 5, 2024
Merged

Merge in daide2eng #14

merged 10 commits into from
Nov 5, 2024

Conversation

aphedges
Copy link
Contributor

@aphedges aphedges commented Nov 5, 2024

The current daide2eng package is incompatible with daidepp and is an extra dependency to maintain. To make usage easier, I am merging it into this library. I needed to make some minor changes to ensure it works properly, but they (hopefully) won't cause any problems.

The main motivation behind this work is that I want to generate English proposals in RandomProposerBot. This will hopefully help CICERO and the Llama advisor provide better advice because they are trained on human games using English. The generated English is not perfect, but it's much closer to what a human uses compared to DAIDE.

The current `daide2eng` package is incompatible with `daidepp` and is an
extra dependency to maintain. To make usage easier, I am merging it into
this library. I copied code from multiple files in the latest commit,
e4fa270:[^1]

- `src/daide2eng/keywords/base_keywords.py`[^2]
- `src/daide2eng/keywords/keyword_utils.py`[^3]
- `src/daide2eng/keywords/press_keywords.py`[^4]
- `src/daide2eng/utils.py`[^5]

I made some minor code changes here, but they were minimal. The changes
needed to actually work and to match our code style will come in later
commits.

[^1]: https://github.com/ALLAN-DIP/daide2eng/tree/e4fa27071c52408817946598d3463f20bf064f6e
[^2]: https://github.com/ALLAN-DIP/daide2eng/blob/e4fa27071c52408817946598d3463f20bf064f6e/src/daide2eng/keywords/base_keywords.py
[^3]: https://github.com/ALLAN-DIP/daide2eng/blob/e4fa27071c52408817946598d3463f20bf064f6e/src/daide2eng/keywords/keyword_utils.py
[^4]: https://github.com/ALLAN-DIP/daide2eng/blob/e4fa27071c52408817946598d3463f20bf064f6e/src/daide2eng/keywords/press_keywords.py
[^5]: https://github.com/ALLAN-DIP/daide2eng/blob/e4fa27071c52408817946598d3463f20bf064f6e/src/daide2eng/utils.py
There were some changes to the DAIDE language represented by `dadie2eng`
that are not present in `daidepp`. I am removing them because there is
no clear need to keep them.
Generated automatically by `make black`
Generated automatically by `make ruff`
I needed to make many changes so the copied code would actually generate
English from DAIDE.

To test my changes in this commit and later ones, I used
`translation.json`[^1] from the `daide2eng` repository, along with the
following script:

```python
import json
from pathlib import Path
import sys

from daide2eng.utils import gen_English as gen_english_old

from chiron_utils.daide2eng import gen_english as gen_english_new

input_path = Path(sys.argv[1])
output_path = Path(sys.argv[2])

input_text = input_path.read_text(encoding="utf-8")
old_rows = json.loads(input_text)

if isinstance(old_rows, list):
    daide_strings = sorted(set(row["daide"] for row in old_rows))
    gen_english = gen_english_old
elif isinstance(old_rows, dict):
    daide_strings = list(old_rows)
    gen_english = gen_english_new
else:
    raise Exception
if gen_english.__name__ == "gen_english":
    daide_strings = [s.replace("LYO", "GOL") for s in daide_strings]

new_rows = {daide: gen_english(daide) for daide in daide_strings}

output_text = json.dumps(new_rows, ensure_ascii=False, indent=2) + "\n"
output_path.write_text(output_text, encoding="utf-8")
```

The first path in the script converts the original JSON file into a
simpler format using the current translation provided by `daide2eng`.
The second path converts the DAIDE from the JSON file output by the
first path into English using the new implementation in `chiron-utils`.
I ran the script repeatedly throughout development to ensure no unwanted
translation changes were made.

[^1]: https://github.com/ALLAN-DIP/daide2eng/blob/e4fa27071c52408817946598d3463f20bf064f6e/translation.json
I changed `arragement` to `arrangement` in two places.
As part of this, I also improved the existing docstrings.
It's inefficient to convert a DAIDE object to a string before it is
parsed into a DAIDE object again.
This will hopefully help CICERO and the Llama advisor provide better
advice because they are trained on human games using English. The
generated English is not perfect, but it's *much* closer to what a human
uses compared to DAIDE.
@aphedges aphedges added the enhancement New feature or request label Nov 5, 2024
@aphedges aphedges self-assigned this Nov 5, 2024
@aphedges aphedges merged commit 100d391 into main Nov 5, 2024
1 check passed
@aphedges aphedges deleted the merge-in-daide2eng branch November 5, 2024 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant