Keyterm data always gets added - and then we always train #476

johnml1135 · 2024-09-04T15:01:08Z

Should we add a separate flag for "only pretranslate"? Or should we automagically work if there is no matching corpora, we don't include the keyterms?

Nateowami · 2024-09-04T20:04:34Z

@johnml1135 It seems to me like "only train on key terms, then generate a draft for my first book" is a valid but very unusual use-case. It would have to be a new project in an NLLB language.

ddaspit · 2024-09-04T20:54:24Z

We could filter the key terms by book/chapter. Each key term has a list of verses that they occur in.

johnml1135 · 2024-09-05T14:39:45Z

@ddaspit, so, filter on any data trained on or pretranslated? That would leave us with the same issue - namely that if you just want to translate from English to Spanish using NLLB200, that is now prevented. If we want to implement that filter, I would consider that a separate enhancement.

ddaspit · 2024-09-05T14:48:14Z

You are correct. It would still train the model. This issue made me realize that we should filter the key terms.

We already have the use_key_terms build option, which excludes the key terms from the training data. That might be sufficient.

johnml1135 · 2024-09-05T19:16:55Z

If it is, we should test it out (at least manually) and then document it.

johnml1135 · 2024-11-01T19:52:42Z

use_key_terms would be sufficient to not train any segments and still allow NLLB pretranslations without training. The filtering of key terms is also implemented.

johnml1135 · 2024-11-01T19:54:22Z

Actually, the Serval changes need to be merged in first before this is completed.

johnml1135 · 2024-11-26T18:54:33Z

We are waiting on #508.

johnml1135 added the bug Something isn't working label Sep 4, 2024

johnml1135 added this to Serval Sep 4, 2024

github-project-automation bot moved this to 🆕 New in Serval Sep 4, 2024

johnml1135 assigned ddaspit Sep 4, 2024

Nateowami added the sf_watching Scripture Forge should be updated when this is resolved or updated label Sep 4, 2024

johnml1135 assigned mshannon-sil and johnml1135 and unassigned ddaspit Sep 5, 2024

johnml1135 assigned Enkidu93 and unassigned mshannon-sil Oct 7, 2024

This was referenced Oct 11, 2024

Filter key terms by book/chapter #508

Closed

Add parameter for filtering key terms by book/chapters sillsdev/machine#256

Merged

johnml1135 closed this as completed Nov 1, 2024

github-project-automation bot moved this from 🆕 New to ✅ Done in Serval Nov 1, 2024

johnml1135 reopened this Nov 1, 2024

johnml1135 moved this from ✅ Done to 🏗 In progress in Serval Nov 1, 2024

Enkidu93 mentioned this issue Nov 26, 2024

Use chapter-filtering for terms #545

Merged

Enkidu93 closed this as completed in #545 Nov 27, 2024

github-project-automation bot moved this from 🏗 In progress to ✅ Done in Serval Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keyterm data always gets added - and then we always train #476

Keyterm data always gets added - and then we always train #476

johnml1135 commented Sep 4, 2024

Nateowami commented Sep 4, 2024

ddaspit commented Sep 4, 2024

johnml1135 commented Sep 5, 2024

ddaspit commented Sep 5, 2024

johnml1135 commented Sep 5, 2024

johnml1135 commented Nov 1, 2024

johnml1135 commented Nov 1, 2024

johnml1135 commented Nov 26, 2024

Keyterm data always gets added - and then we always train #476

Keyterm data always gets added - and then we always train #476

Comments

johnml1135 commented Sep 4, 2024

Nateowami commented Sep 4, 2024

ddaspit commented Sep 4, 2024

johnml1135 commented Sep 5, 2024

ddaspit commented Sep 5, 2024

johnml1135 commented Sep 5, 2024

johnml1135 commented Nov 1, 2024

johnml1135 commented Nov 1, 2024

johnml1135 commented Nov 26, 2024