[Language idea] Ignore accents when transpiling #6000

jpelay · 2022-04-06T12:54:20Z

jpelay
Apr 6, 2022
Maintainer

When checking the input of an if, we match exactly the string given by the user, which is nice! But sometimes maybe we want to be more forgiving about what constitutes a match. One of these cases are accents: a character with an accent looks very similar to their ascii counterpart, requires you to press a different key before, and are (in the Spanish case) are not that used in day to day conversations, so it's easy to ignore them.

This could generate problems when checking an if, for example.

if ans is sí
   print 'Awesome'

If the kids inputs si, but not sí, the if will not enter and possibly confuse the kid. This same logic can be applied to keywords and variables, we'd want jesus and jesús to be the same variable.

@Felienne has pointed out that we might not want to do this for every language or for every type of accent, because they're not necessarily equivalent in some languages, like French for example.

One posible way to deal with this, suggested by @Felienne and @TiBiBa is to create a mapper that maps chars with accents to their ascci equivalent, one downside is that this is very slow.

TiBiBa · 2022-04-06T13:04:25Z

TiBiBa
Apr 6, 2022
Collaborator

Apparently there is already a library for this! (Of course there is in Python...):

import unidecode

somestring = "àéêöhello"

#convert plain text to utf-8
u = unicode(somestring, "utf-8")
#convert utf-8 to normal text
print unidecode.unidecode(u)

Output:

aeeohello

Found the example here: https://stackoverflow.com/questions/44431730/how-to-replace-accented-characters#44433664

0 replies

Felienne · 2022-04-06T13:05:40Z

Felienne
Apr 6, 2022
Maintainer

Ow wow that is a great find @TiBiBa!

0 replies

TiBiBa · 2022-04-06T13:09:30Z

TiBiBa
Apr 6, 2022
Collaborator

We do however, still have the issue of comparisons on the front-end so we should implement a similar solution within TypeScript. Because we don't talk with the server after the code is transpiled to Python (correct me if I'm wrong!), the following code needs both the front-end and back-end to replace the characters:

animal = 'panda'
if animal is pandá print 'awesome!'
else print 'sad face'

0 replies

jpelay · 2022-04-06T13:13:48Z

jpelay
Apr 6, 2022
Maintainer Author

Apparently there is already a library for this! (Of course there is in Python...):
import unidecode

somestring = "àéêöhello"

#convert plain text to utf-8
u = unicode(somestring, "utf-8")
#convert utf-8 to normal text
print unidecode.unidecode(u)
Output:

aeeohello

Found the example here: https://stackoverflow.com/questions/44431730/how-to-replace-accented-characters#44433664

Yes! I found this earlier and they mention some problems, but I haven't tested myself (https://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-normalize-in-a-python-unicode-string)

0 replies

jpelay · 2022-04-06T13:16:15Z

jpelay
Apr 6, 2022
Maintainer Author

We do however, still have the issue of comparisons on the front-end so we should implement a similar solution within TypeScript. Because we don't talk with the server after the code is transpiled to Python (correct me if I'm wrong!), the following code needs both the front-end and back-end to replace the characters:
animal = 'panda'
if animal is pandá print 'awesome!'
else print 'sad face'

Maybe we can do the same thing as with the numeric characters and include a function within the transpiled code something like:

input = normalize_accents(input)
to_check = normalize_accents(rhs_if)

if input == to_check:

0 replies

Felienne · 2024-02-23T08:51:36Z

Felienne
Feb 23, 2024
Maintainer

And this one @boryanagoncharenko? Could be some fun language puzzling?

1 reply

boryanagoncharenko Dec 2, 2024
Maintainer

And this one @boryanagoncharenko? Could be some fun language puzzling?

Definitely a fun language puzzle!

As Jesus suggested, we can add a normalization function as a prefix to the whole program similarly to what we already have for localization. My hunch is that Skulpt will not support any solutions existing in Python, so probably we will end up with something like a map. Performance issues will need to be evaluated.

The more interesting question is how to define what exactly we want to substitute! Is it true that in Spanish the accented letters are used only to define the stress of the word? If so, for this language we could strip the accented letters. Greek is similar I think. However, this is not the case for French and most Slavic languages - removing an accent can change the meaning of the word. I can make the prototype and do a bit of research on what else we want to substitute.

I am wondering whether it is a good idea to solve another issue: the Cyrillic and the Latin letters a, o, e. They appear the same but ofc they are not equivalent. I often fall for this trap when I switch between languages on my keyboard, especially with the if-pressed statement.

Should I convert this to an issue or create a new issue, so that we keep the discussion here?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Language idea] Ignore accents when transpiling #6000

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

[Language idea] Ignore accents when transpiling #6000

jpelay Apr 6, 2022 Maintainer

Replies: 6 comments · 1 reply

TiBiBa Apr 6, 2022 Collaborator

Felienne Apr 6, 2022 Maintainer

TiBiBa Apr 6, 2022 Collaborator

jpelay Apr 6, 2022 Maintainer Author

jpelay Apr 6, 2022 Maintainer Author

Felienne Feb 23, 2024 Maintainer

boryanagoncharenko Dec 2, 2024 Maintainer

jpelay
Apr 6, 2022
Maintainer

Replies: 6 comments 1 reply

TiBiBa
Apr 6, 2022
Collaborator

Felienne
Apr 6, 2022
Maintainer

TiBiBa
Apr 6, 2022
Collaborator

jpelay
Apr 6, 2022
Maintainer Author

jpelay
Apr 6, 2022
Maintainer Author

Felienne
Feb 23, 2024
Maintainer

boryanagoncharenko Dec 2, 2024
Maintainer