-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a blacklist for randomCharFromSetCensorStrategy #84
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #84 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 27 26 -1
Lines 505 469 -36
Branches 92 81 -11
=========================================
- Hits 505 469 -36 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks good. I'll merge this along with #78 when I have time to package a new release, probably during the winter holiday—you don't need to do anything with this PR in the meantime. Sorry for any delay, and if you're hitting this in production you can always use this censor strategy directly in your code until the new version is out.
I do want to note in passing that this isn't going to catch cases like @$$$
; I'm not sure how big of an issue this is (and to be clear I'm not asking you to address it in this PR—it is fine as is, I am just thinking of possible followups I might want to consider.) That said, the probability of generating such combinations decreases exponentially with length and are less obviously problematic.
Good point. I can pass a list of patterns, compile them into one regexp, and match against the generated string. Not sure if I should anchor the regexp or not. If not, everything will match as a substring, so
No problem at all. I don't know if this was even a big deal, but my users reported it. I already changed my app to use |
Hm. I don't know how comfortable I am with this idea: it feels like we're starting to duplicate the work of the main library (though obviously at a much smaller scale—it's possible this is a slippery slope argument on my part), and I wonder if there is a more general solution. On that note, I discussed this issue briefly with some friends elsewhere and one of them proposed having grawlix never generate the same character twice in a row, which incidentally fixes this issue at the cost of even more complexity. I will have to think about that. In any case, though, please don't let me push this work onto you if you don't have the time, especially if the right 'complete' solution is still up in the air. Like I said, this PR is a fine stopgap as is. Unicode grawlix is a very fun idea, though 😁 |
Yes, it's definitely some duplication of the main library. One other option is then to run the main library on the generated replacement in Not generating the same character twice in a row is actually very easy to do as well. Let me know if you like that idea. It's less code than in this PR. |
I'm happy to try this out. It would be good to have an error in the degenerate case of 1 input codepoint (if avoiding consecutive identical characters is enabled.) |
I'm going to close this in favor of your new PR #85, which generalizes a little better. Thanks for the fruitful discussion. |
Fixes #82
Type of change:
Please describe the changes this PR makes and why it should be merged:
As suggested in #82, make randomCharFromSetCensorStrategy generate a new replacement string if it generated a bad word.
Status: