-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model change: use audio-<md5hash of sentences>.wav instead of audio-<idx>.wav as filenames #106
Comments
I would definitely agree there's a more optimal way. For example, there are quirks with the current update method where based on the parser, it will still delete audio for sentence that were already present (parser needs work) and then the issue of renaming requires renaming audio files for all indexes which I can see being very inefficient for large books (and I'm assuming this is how you may have run into it.) The idea behind suffixes was that it makes it human readable, but that could be redundant as long as we keep track of the audio file. However, one intentional part of the data structure is to tie the audio to not only the sentence, but to each parameter in the dictionary. Hashing the audio file would be a nice way to prevent having to rename it, but the hash would need to identify what dictionary it's tied to as well. In a sense, I'm using the audio file name as the ID of the dictionary. Just some of my thoughts below:
For these two points, reuse for the same sentence may not be ideal as the TTS engines are non-deterministic when used with random seed. It may be desired to have variation between one sentence vs another with speaker_id or seed.
I think the above thoughts also cover my thinking on this point, but given the coupling between all parameters in the dictionary, I'm not sure if changing the audio naming would resolve this issue. One thought that came to my head was that I could store all generations in some type of "database" with each generation tied to a unique ID. Then, ordering sentence would be a simple matter of just changing the IDs around. I'll have to think this through more, but I'm all ears as well!
If I'm missing anything, lmk, but based on some of my thoughts above, I'd have to think through more of it before I'd merge a change.
I think this is a great idea! Off topic, but txt files are pretty primitive as well, so I'm looking to add first epub support, then PDF to the project sometime in the future. |
Closing for now, added things like m4b to feature tracker |
I’ve been thinking about the current system for numbering the audio files (audio-xxx.wav) using an index, and I believe it’s suboptimal. It complicates tasks like inserting or reordering files, and we don’t gain much since we already have the index in the JSON. After looking at model.update_audiobook(), I think there’s a more efficient solution.
My suggestion is to use the MD5 hash of the sentences as the identifier for audio files instead of the index. This approach would offer several benefits:
What do you think? Is there anything I might have overlooked? Would you like me to go ahead and implement this, in a way that does not break old audiobooks?
P.S. I’m also considering adding a feature to mark sentences as chapter starts (either manually or using a regular expression). This would help us navigate the sentence list with a go to next/last chapter button and could be used to split output files or mark chapters in an M4B file. Thoughts?
The text was updated successfully, but these errors were encountered: