-
-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent KeyErrors in server code #59
Comments
Sorry you're having trouble with this, that's a very strange error. Those are all extremely common characters (except ★ I guess), so it seems unlikely cutlet would generally be unable to handle them. In particular, it shouldn't be possible for kanji to be passed to What version of cutlet are you using? How are you initializing the This code worked for me, for example - can you see if it works for you, outside of your application?
|
I am using 0.4.0 of cutlet. C:\Users\YumYummity>pip freeze | findstr "cutlet"
cutlet==0.4.0 I initialize katsu inside a class: class SOMECLASS:
def __init__(self):
# some code
def some_func(self):
self.katsu = cutlet.Cutlet()
self.katsu.romaji("some japanese")
Works fine. Hmm C:\Users\YumYummity>python
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import cutlet
>>>
>>> katsu = cutlet.Cutlet()
>>>
>>> for char in "彼★大繋森":
... print(katsu.romaji(char))
...
Kare
Oo
Tsunagi
Mori
>>> |
OK, thanks for confirming details and that the code sample ran. Nothing looks wrong with your initialization, so I'm not really sure what could cause this. I'll try a few more things, but if you can come up with an example I can run that has the same issue I could debug it. There is one thing I understood - since |
I believe I'm using Also, another thing I've noticed is that the KeyError doesn't happen every time, just rarely on some characters. |
Hm, that's weird. If you're using a standard setup then I'm not sure how I can reproduce it. The best thing would be if you could find a snippet of code that reproduces it for me to try, though it sounds like that may be hard. When you say it doesn't happen every time, do you mean that given the same input, sometimes it happens and sometimes it doesn't? If that's the case I guess it could be a threading issue, though I think that shouldn't happen... If you do have a reproducible case locally, what you can do is drop a breakpoint or print statement to see why the token is being detected as kana. The place to do that would be this line. The main thing to check woult be |
Same input, yeah. The data that usually errors is from this file: https://github.com/Sekai-World/sekai-master-db-diff/blob/main/events.json I don't have it reproducible sadly as it appears random, and has only happened a few times. If you want, I can send you the class where Cutlet is used, but it's pretty long. Also, speaking of threading, the function that calls Cutlet is being threaded: threading.Thread(target=self.refresh_data).start()
# Cutlet is initialized and called in self.refresh_data() |
Update: it happened again. (this is a mess of an error, as it's technically two errors being printed at the same time as the thread ran twice in a short time) Exception in thread Thread-21 (refresh_data):
Traceback (most recent call last):
File "C:\Development\Python\Python3.10.6\lib\threading.py", line 1016, in _bootstrap_inner
Exception in thread Thread-23 (refresh_data):
Traceback (most recent call last):
File "C:\Development\Python\Python3.10.6\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\Development\Python\Python3.10.6\lib\threading.py", line 953, in run
self.run()
File "C:\Development\Python\Python3.10.6\lib\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\YumYummity\Desktop\bot\twitch\pjsk.py", line 2210, in refresh_data
self._target(*self._args, **self._kwargs)
File "C:\Users\YumYummity\Desktop\bot\twitch\pjsk.py", line 2209, in refresh_data
self._title_maps[self.katsu.romaji(title)] = data["id"]
File "C:\Development\Python\Python3.10.6\lib\site-packages\cutlet\cutlet.py", line 299, in romaji
self._title_maps[self.katsu_foreignless.romaji(title)] = data["id"]
File "C:\Development\Python\Python3.10.6\lib\site-packages\cutlet\cutlet.py", line 299, in romaji
tokens = self.romaji_tokens(words, capitalize, title)
File "C:\Development\Python\Python3.10.6\lib\site-packages\cutlet\cutlet.py", line 222, in romaji_tokens
tokens = self.romaji_tokens(words, capitalize, title) roma = self.romaji_word(word)
File "C:\Development\Python\Python3.10.6\lib\site-packages\cutlet\cutlet.py", line 222, in romaji_tokens
File "C:\Development\Python\Python3.10.6\lib\site-packages\cutlet\cutlet.py", line 322, in romaji_word
roma = self.romaji_word(word)
File "C:\Development\Python\Python3.10.6\lib\site-packages\cutlet\cutlet.py", line 322, in romaji_word
return self.map_kana(kana)return self.map_kana(kana)
File "C:\Development\Python\Python3.10.6\lib\site-packages\cutlet\cutlet.py", line 367, in map_kana
out += self.get_single_mapping(pk, char, nk) File "C:\Development\Python\Python3.10.6\lib\site-packages\cutlet\cutlet.py", line 367, in map_kana
File "C:\Development\Python\Python3.10.6\lib\site-packages\cutlet\cutlet.py", line 420, in get_single_mapping
out += self.get_single_mapping(pk, char, nk)
return self.table[kk]
KeyError File "C:\Development\Python\Python3.10.6\lib\site-packages\cutlet\cutlet.py", line 420, in get_single_mapping
: '灰'
return self.table[kk]
KeyError: '幸' |
Hm, it seems very likely it's a threading issue. Can you try making a single cutlet object per thread and see if that resolves the issue? That won't help resolving the root cause but I would expect it to fix your problem. |
I don't think it's a threading issue, as I've checked the first error I sent. The error was from before I implemented threading. |
Oh, do you have example code that causes the error without threading? If you do definitely send it to me. Reviewing things here, your first example has references to I assume threading is involved because your error seems to be running through this code:
With your error, characters that are obviously not hiragana or katakana are going inside this If it's not threading related... I don't have many ideas about what could cause an issue with the character type. I'll have to look at the implementation again. |
I believe the first traceback I sent didn't use threading (it also doesn't say Here's what my implementation currently looks like (cutting some irrelevant variables and code) class pjsk_data:
def __init__(self):
self._refreshed_at = 0
self.refresh_data()
def refresh_data(self):
# URLS
url_jp = "https://sekai-world.github.io/sekai-master-db-diff/musics.json"
url2_jp = "https://sekai-world.github.io/sekai-master-db-diff/musicDifficulties.json"
url = "https://sekai-world.github.io/sekai-master-db-en-diff/musics.json"
url2 = "https://sekai-world.github.io/sekai-master-db-en-diff/musicDifficulties.json"
url3_jp = "https://sekai-world.github.io/sekai-master-db-diff/musicTags.json"
url3 = "https://sekai-world.github.io/sekai-master-db-en-diff/musicTags.json"
event_url = "https://sekai-world.github.io/sekai-master-db-en-diff/events.json"
event_url_jp = "https://sekai-world.github.io/sekai-master-db-diff/events.json"
# Functions and Tools
def simplify_title(title):
# Remove all non-alphanumeric characters and convert to lowercase
simplified_title = re.sub(r'[^a-zA-Z0-9\s]', '', title).lower()
# Remove extra whitespace
simplified_title = re.sub(r'\s+', ' ', simplified_title).strip()
return simplified_title
self.katsu = cutlet.Cutlet()
self.katsu_foreignless = cutlet.Cutlet()
self.katsu_foreignless.use_foreign_spelling = False
# Requests
songs = requests.get(url).json()
songs_difficulties = requests.get(url2).json()
jp_songs_difficulties = requests.get(url2_jp).json()
jp_songs = requests.get(url_jp).json()
jp_tag_data = requests.get(url3_jp).json()
tag_data = requests.get(url3).json()
events = requests.get(event_url).json()
events_jp = requests.get(event_url_jp).json()
# Maps
self.tag_map = {
"all": None,
"other": "Other",
"none": "No Main Unit",
"vocaloid": "VIRTUAL SINGER",
"piapro": "VIRTUAL SINGER",
"school_refusal": "Nightcord at 25:00",
"light_sound": "Leo/need",
"light_music_club": "Leo/need",
"idol": "MORE MORE JUMP!",
"street": "Vivid BAD SQUAD",
"theme_park": "Wonderlands×Showtime"
}
self.event_type_map = {
"marathon": "Marathon",
"cheerful_carnival": "Cheerful Carnival",
"world_bloom": "World Link"
}
self.custom_title_definitions = {
99: [ # MORE! JUMP! MORE!
"mjm",
"dakara motto"
],
135: [ # Six Trillion Years and Overnight Story
"six trillion"
],
162: [ # End Mark ni Kibou to Namida wo soete
"endmark"
],
164: [ # Don't Fight The Music
"dftm"
],
176: [ # Machinegun Poem DOll
"mgpd"
],
186: [ # Hatsune Creation Myth
"hcm"
],
226: [ # Lost and Found
"lnf",
"kimino"
],
250: [ # Kusare-Gedou and Chocolate
"kusare-gedou",
"chocolate boss song"
],
251: [ # Fräulein=библиотека
"Fraulein"
],
315: [ # What's Up? Pop!
"wup"
],
328: [ # Sekai-Chan and Kafu-Chan's Otsukai Gassoukyoku
"sekai-chan",
"kafu-chan"
],
396: [ # 東京テディベア
"ttb",
"tokyo teddy bear"
],
503: [ # 超最終鬼畜妹フランドール・S
"flandre s"
],
}
# Title Maps
self._title_maps = {}
for data in songs:
title = data["title"].lower().strip()
simplified_title = simplify_title(title)
self._title_maps[title] = data["id"]
if simplified_title != title:
self._title_maps[simplified_title] = data["id"]
for data in jp_songs:
title = data["title"].strip()
self._title_maps[title] = data["id"]
self._title_maps[self.katsu.romaji(title)] = data["id"]
self._title_maps[self.katsu_foreignless.romaji(title)] = data["id"]
# Check if there are custom titles defined for this data["id"]
if data["id"] in self.custom_title_definitions:
custom_titles = self.custom_title_definitions[data["id"]]
for custom_title in custom_titles:
self._title_maps[custom_title.lower().strip()] = data["id"]
# Event Maps
self._event_maps = {}
for data in events:
title = data["name"].lower().strip()
simplified_title = simplify_title(title)
self._event_maps[title] = data["id"]
if simplified_title != title:
self._event_maps[simplified_title] = data["id"]
for data in events_jp:
title = data["name"].strip()
self._event_maps[title] = data["id"]
self._event_maps[self.katsu.romaji(title)] = data["id"]
self._event_maps[self.katsu_foreignless.romaji(title)] = data["id"]
# Check if there are custom titles defined for this data["id"]
# if data["id"] in self.custom_title_definitions:
# custom_titles = self.custom_title_definitions[data["id"]]
# for custom_title in custom_titles:
# self._event_maps[custom_title.lower().strip()] = data["id"] and how it's called: def _check_refresh(self):
if self._refreshed_at < time.time() - 3600: # 3600 seconds = 1 hour
threading.Thread(target=self.refresh_data).start()
@property
def title_maps(self):
self._check_refresh()
return self._title_maps |
Apologies for the delayed reply. There is an actual bug here, but it is not in Cutlet, and your issue must be threading related. Your example code is not doing anything that would cause a problem. However, despite what you said, your first traceback is definitely using threading, see the first lines referencing
Given the file path of your code, is this a twitch bot that takes callbacks or something? Is some of the code async? The actual bug is that the character class is being invalidated / overwritten when
However, because Cutlet manages the nodes and tagger internally, it is not possible to for this to happen unless something is happening inside a single Cutlet call, which I can only imagine being caused by threading or async code. You should be able to resolve your issue by using one Cutlet object per thread, and making sure threads are not suspended while Cutlet is making individual calls. |
Hi, thanks for the help! Yeah I didn't see the threading reference in my first error. The code is meant to fetch some data from a repository and use it in a twitch bot; and refreshes the data every N seconds. This is when a new thread is made to run the fetching and romaji conversions. The twitch bot is async. The function in which Cutlet is called is not; however it is being threaded. I'll try your suggestion, thanks! |
More characters (same traceback, just not including the full thing):
KeyError: '★'
KeyError: '大'
KeyError: '繋'
KeyError: '森'
may find more in the future
The text was updated successfully, but these errors were encountered: