Using with Additional corpus of spelling mistakes. #39

willwade · 2024-01-18T00:20:38Z

I’m pondering on using this as a service to an app for disabled people who we support who would use this to communicate. We see a lot of users who do this tapping on letters but often never use a space. But. We have a Snag in they do make errors. (See https://youtu.be/SDkE-aO3tOQ?si=0GAUyTKDh-q_sAxm and a quick app for iOS we made https://github.com/AceCentre/DragToSpeak and now contemplating using a rest api largely using word segment. )

So I was wondering about adding to the standard corpus with something like https://www.dcs.bbk.ac.uk/~ROGER/corpora.html

I read this https://stackoverflow.com/a/32364566/1123094

it looks like I can create a file of Bigrams or unigrans and weights and add to the standard corpus. Right? Or is there a better way.

grantjenks · 2024-01-18T00:29:08Z

Modifying the unigrams and bigrams is the best way I can think of. You’ll have to account for every typo variation of every word though. There may be a way to modify the algorithm instead but I’m not sure.

Certainly AI models can do it but I don’t know about scope and scale.

willwade · 2024-01-18T00:34:30Z

Yeah. GPT can do it - and multilingually. But it feels like a huge hammer to crack a nut. Thanks

willwade · 2024-01-18T11:22:58Z

If anyone is interested I've got a complete modified unigrams json in this repo - and code to read in spelling mistakes here

https://github.com/AceCentre/Correct-A-Sentence/blob/main/helper-scripts/create_unigrams_spellingerrors.py

Dare say some madness in my logic. i am using the weights from the spelled correctly word which may be a bad idea.

NB: Can someone clarify something for me.. I've updated the unigrams json. Should I be updating the bigrams json, too, with the misspelling sentences, e.g. " </s/> alcohol": 541645.0," and add "</s/> alchol": 541645.0, " etc.

willwade changed the title ~~Using with Addotional corpus of spelling mistakes.~~ Using with Additional corpus of spelling mistakes. Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using with Additional corpus of spelling mistakes. #39

Using with Additional corpus of spelling mistakes. #39

willwade commented Jan 18, 2024

grantjenks commented Jan 18, 2024

willwade commented Jan 18, 2024

willwade commented Jan 18, 2024 •

edited

Loading

Using with Additional corpus of spelling mistakes. #39

Using with Additional corpus of spelling mistakes. #39

Comments

willwade commented Jan 18, 2024

grantjenks commented Jan 18, 2024

willwade commented Jan 18, 2024

willwade commented Jan 18, 2024 • edited Loading

willwade commented Jan 18, 2024 •

edited

Loading