Skip to content
agrodet edited this page Apr 25, 2020 · 1 revision

Introductory notes - what are tags?

Tags are supposed to be meta data that gives extra information about a sentence. 
As such tags should not be confused with lists.

REWORK HOW TAGS ARE ADDED / DELETED TO A SENTENCE

Add a "Tag sentence" icon to the sentence block menu. The "Add tag" feature would be similar to the "Add to list" feature. The equivalent to "Your last selected list (if any) and last updated lists" suggestions would be split into two sections: "Your most recently applied tags" and "Most useful utility tags" (@-tags). The usefulness of the latter can be discussed: Would it be faster to scroll until "@change" and click on it or to simply enter "@ch" in the input field, get the suggested tag names, and click on "@change"? Regular users would not have the second section.

On the sentence page, we wouldn't have an input field like we have now. Instead, we would have a "Tags" section similar to the "Lists" section. We would need a way to delete a tag from that section, just like we need it for lists (issue number?). The removal of a tag from that section shouldn't reload the entire page (#1237)

There is a need to use loose-matching for auto-suggestion (lowercase / uppercase, accents, etc.). (#302)

However, on the "Browse by tags" page, the input field is needed for search. There, we need to add an indicator showing that the suggestion search is still running (#298)

Finally, one last problem is left to solve: How to provide the option to tag translations on the fly? Simple use case: I search for sentences tagged "animal" in order to translate them. Obviously, I also want to tag my translations "animal" (if relevant).

MERGING OF TAGS (#961)

Provide a tool to corpus maintainers to allow them to rename, merge etc. tags.

  • Merging should be a once in a while operation.
  • It should NOT be a one-man operation. This should be discussed among corpus maintainers.
  • The community should be notified on what tag(s) would be merged to what tag(s). Users could argue or rebute the merging for some tags.
  • URL of the removed tag should somehow redirect to the URL of the tag it was merged into.
  • When tags will be translatable, there is a non-negligible risk that translations would prevent a merge operation. For example, Tag A and Tag B have similar meaning in English, but not in Finnish. How do we merge Tag A and Tag B? Since tags are used for meta data, this shouldn't be a problem. Because translations would be up to users, it will be.

TRANSLATION OF TAGS (#54)

  • Adding a tag shouldn't be reserved to English speakers.
  • User A adds a tag in Finnish, it's fine. When the tag gets translated, it can be linked to the "group" of translations of the same object.
  • It may happen that translating Tag A from Finnish to English apparently gives a new tag but it's just because the translation of Tag A is different from "Tag B", although Tag B and Tag A are actually the same. This could be mitigating by merging tags (or not, see above).
  • Tags should be displayed in the interface language.

In summary, using English as a common ground, and making the use of a tag possible only after it has been added in English is intrinsically wrong and unfair. The same goes for requesting a new tag.

CATEGORIZATION (#333)

TRANG asked the following questions:

  • What categories do we want exactly?
  • What would be the process for adding a new category? For deleting a category? For renaming a category?
  • How do we decide which tag belongs to which category?

With a little bit more details:

  • Who adds a category? When?
  • How to notify users?

And arguably the most important: What IS a category and how do we use it?

gillux's comments:
The way I see it, a category should (in order of importance):

  1. Help structuring and discovery by providing a way to navigate through tags.

Use case 1. I’m looking at sentences tagged as "airport" which is within the category "situation". I can find other tags in that category such as "hotel" or "restaurant" in order to discover other situations.

Use case 2. I’m looking at the list of all tags, but it’s soooo long that it’s overwhelming. It’s not easy to discover something interesting. So instead, I browse the list of categories first. I find the category "situation", and now the tag list is manageable.

Use case 3. I’m just looking at the list of categories and, unlike the current list of tags, it looks well organized, so I trust the content more.

  1. Make tag names better self-contained. If the category sticks with the tag name in the UI, "situation:airport" (or "situation:at the airport") is much easier to understand than just "airport". The category acts like a "What is" question to which the tag name is the answer. (Maybe that pattern doesn’t work all the time, like for "proverb".)

  2. Help distinguishing translations: (dreaming a bit)

Use case: an English sentence has several Japanese translations, but I cannot tell the difference between them. If one sentence is tagged "polite" and the other "informal", then, since both tags belong to the category "politeness", Tatoeba can automatically tell me that these translations differ in terms of politeness. That’s a very valuable information for learners.

  1. There might be some rare cases where a given tag belongs to different categories depending on what you mean. For example, let’s say we have two sentences "There are no high schools in my town." and "Sorry, teacher, I forgot to do my homework!" currently tagged as "school". Now if I was to set a category, the former would be "topic" while the latter would be "situation". So having to think about the category forces to distinguish some tags that were otherwise treated as same, which makes the metadata more accurate.

It's easy to say @-tags belong to the "utility" category and the "by xxx" to the "authored" one. Those are trivial cases.
But, suppose a sentence is tagged "animal", what category is it?

gillux's comments:
I think "topic" would be the right category. Thinking about the use case of somebody who wants sentences that include animal names or vocabulary frequently used with animals, then it’s just about the topic.

But I agree that some tags are tricky and need careful thinking. What about the tag "insulting"? Maybe the category could be "connotation"?


And now, there is a dozen of sentences talking about snakes. It makes sense to tag them all "snake". Shouldn't they be tagged "animal"? Shouldn't "animal" include "snake"?

gillux's comments:
I don’t think it’s necessary to tag them "snake" as long as it’s possible to find them by searching for "snake". The main goal of tags is to allow finding sentences based on meta-information, but "snake" is probably not meta.

That said, I got your point. Here is another example: what if we have a lot of sentences tagged "insect"? Shouldn’t they be tagged "animal" too?
Shouldn’t "animal" somehow include "insect"?

My answer: yes, but both "animal" and "insect" remain in the same "topic" category. Maybe such hierarchy could be used for discovering by navigating through categories by going "up" or "down". For example, I navigate from "topic:insect" to "topic:animal" (up), and then from "topic:animal" to "topic:fish" (down).

But this is different from, let’s say, the "future" tag that belongs to the "tense" category, which belongs to the "grammar" category. Instead, the tag would maybe look like "grammar:tense:future". From there, I can go "up", to "grammar:tense", but this one doesn’t contain any sentence, only other tags about tenses like "grammar:tense:past".

Maybe we can keep the "categories of categories" feature for later.


Categorization could be used as a mean of control to "Allow users to tag their own sentences without special permissions" (#1198)

MAINTENANCE OF TAGS

Problem: "Today, tagging is a free-for-all activity. Contributors are not consulting each other before creating new tags. We have many duplicate tags and many "personal" tags."

Simple possibility: Display tags ordered by "creation date" and allow corpus maintainers to merge / delete (see above). Not a one-person decision. The creator should be notified with an explanation on the use of tags.

Miscellaneous

  • Correctly deal with sentences merged by Horus. (#1622)

  • Add an admin page to remove tags (#330). This overlaps with the previous section. One could argue to limit the functions to admin or not. In any case, maintenance of tags should not be a one-person decision.