Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keyterm data always gets added - and then we always train #476

Closed
johnml1135 opened this issue Sep 4, 2024 · 8 comments · Fixed by #545
Closed

Keyterm data always gets added - and then we always train #476

johnml1135 opened this issue Sep 4, 2024 · 8 comments · Fixed by #545
Assignees
Labels
bug Something isn't working sf_watching Scripture Forge should be updated when this is resolved or updated

Comments

@johnml1135
Copy link
Collaborator

Should we add a separate flag for "only pretranslate"? Or should we automagically work if there is no matching corpora, we don't include the keyterms?

@johnml1135 johnml1135 added the bug Something isn't working label Sep 4, 2024
@johnml1135 johnml1135 added this to Serval Sep 4, 2024
@github-project-automation github-project-automation bot moved this to 🆕 New in Serval Sep 4, 2024
@Nateowami Nateowami added the sf_watching Scripture Forge should be updated when this is resolved or updated label Sep 4, 2024
@Nateowami
Copy link
Collaborator

@johnml1135 It seems to me like "only train on key terms, then generate a draft for my first book" is a valid but very unusual use-case. It would have to be a new project in an NLLB language.

@ddaspit
Copy link
Contributor

ddaspit commented Sep 4, 2024

We could filter the key terms by book/chapter. Each key term has a list of verses that they occur in.

@johnml1135
Copy link
Collaborator Author

@ddaspit, so, filter on any data trained on or pretranslated? That would leave us with the same issue - namely that if you just want to translate from English to Spanish using NLLB200, that is now prevented. If we want to implement that filter, I would consider that a separate enhancement.

@ddaspit
Copy link
Contributor

ddaspit commented Sep 5, 2024

You are correct. It would still train the model. This issue made me realize that we should filter the key terms.

We already have the use_key_terms build option, which excludes the key terms from the training data. That might be sufficient.

@johnml1135
Copy link
Collaborator Author

If it is, we should test it out (at least manually) and then document it.

@johnml1135
Copy link
Collaborator Author

use_key_terms would be sufficient to not train any segments and still allow NLLB pretranslations without training. The filtering of key terms is also implemented.

@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Serval Nov 1, 2024
@johnml1135
Copy link
Collaborator Author

Actually, the Serval changes need to be merged in first before this is completed.

@johnml1135 johnml1135 reopened this Nov 1, 2024
@johnml1135 johnml1135 moved this from ✅ Done to 🏗 In progress in Serval Nov 1, 2024
@johnml1135
Copy link
Collaborator Author

We are waiting on #508.

@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in Serval Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working sf_watching Scripture Forge should be updated when this is resolved or updated
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

5 participants