Add a blacklist for randomCharFromSetCensorStrategy #84

eltoder · 2024-11-28T02:03:20Z

Fixes #82

Type of change:

Please describe the changes this PR makes and why it should be merged:

As suggested in #82, make randomCharFromSetCensorStrategy generate a new replacement string if it generated a bad word.

Status:

I've added/modified unit tests relevant to my change / not needed
This PR contains breaking changes
This PR doesn't include changes to the code

Fixes jo3-l#82

codecov · 2024-11-28T18:43:48Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (36b6512) to head (e7979d3).
Report is 35 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff            @@
##              main       #84   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           27        26    -1     
  Lines          505       469   -36     
  Branches        92        81   -11     
=========================================
- Hits           505       469   -36

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jo3-l

Thanks, this looks good. I'll merge this along with #78 when I have time to package a new release, probably during the winter holiday—you don't need to do anything with this PR in the meantime. Sorry for any delay, and if you're hitting this in production you can always use this censor strategy directly in your code until the new version is out.

I do want to note in passing that this isn't going to catch cases like @$$$; I'm not sure how big of an issue this is (and to be clear I'm not asking you to address it in this PR—it is fine as is, I am just thinking of possible followups I might want to consider.) That said, the probability of generating such combinations decreases exponentially with length and are less obviously problematic.

eltoder · 2024-11-28T19:42:55Z

I do want to note in passing that this isn't going to catch cases like @$$$

Good point. I can pass a list of patterns, compile them into one regexp, and match against the generated string. Not sure if I should anchor the regexp or not. If not, everything will match as a substring, so @\$\$ will match @$$$ and anything else that contains that, including say %*@$$&*. If I anchor the regexp, I can use @+\${2,}. Not anchoring regexp is easier, but the probability of generating something that matches it actually increases with the length instead of decreasing. It's probably still very low for realistic swearing.

Sorry for any delay, and if you're hitting this in production you can always use this censor strategy directly in your code until the new version is out.

No problem at all. I don't know if this was even a big deal, but my users reported it. I already changed my app to use keepStartCensorStrategy(asteriskCensorStrategy()). Now I think that even that is overengineering and I should just replace everything with one "🤬". Another option along the same lines is to use unicode grawlix, then hope that no one complains about "🌀⚡⚡".

jo3-l · 2024-11-28T20:52:58Z

Good point. I can pass a list of patterns, compile them into one regexp, and match against the generated string.

Hm. I don't know how comfortable I am with this idea: it feels like we're starting to duplicate the work of the main library (though obviously at a much smaller scale—it's possible this is a slippery slope argument on my part), and I wonder if there is a more general solution. On that note, I discussed this issue briefly with some friends elsewhere and one of them proposed having grawlix never generate the same character twice in a row, which incidentally fixes this issue at the cost of even more complexity. I will have to think about that. In any case, though, please don't let me push this work onto you if you don't have the time, especially if the right 'complete' solution is still up in the air. Like I said, this PR is a fine stopgap as is.

Unicode grawlix is a very fun idea, though 😁

eltoder · 2024-11-28T21:03:57Z

Yes, it's definitely some duplication of the main library. One other option is then to run the main library on the generated replacement in applyTo.

Not generating the same character twice in a row is actually very easy to do as well. Let me know if you like that idea. It's less code than in this PR.

jo3-l · 2024-11-28T21:40:55Z

Not generating the same character twice in a row is actually very easy to do as well. Let me know if you like that idea. It's less code than in this PR.

I'm happy to try this out. It would be good to have an error in the degenerate case of 1 input codepoint (if avoiding consecutive identical characters is enabled.)

jo3-l · 2024-12-03T01:04:44Z

I'm going to close this in favor of your new PR #85, which generalizes a little better. Thanks for the fruitful discussion.

Add a blacklist for randomCharFromSetCensorStrategy

e7979d3

Fixes jo3-l#82

jo3-l approved these changes Nov 28, 2024

View reviewed changes

jo3-l closed this Dec 3, 2024

eltoder deleted the feature/grawlix-blacklist branch December 3, 2024 01:08

eltoder restored the feature/grawlix-blacklist branch December 3, 2024 01:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a blacklist for randomCharFromSetCensorStrategy #84

Add a blacklist for randomCharFromSetCensorStrategy #84

eltoder commented Nov 28, 2024

codecov bot commented Nov 28, 2024 •

edited

Loading

jo3-l left a comment

eltoder commented Nov 28, 2024 •

edited

Loading

jo3-l commented Nov 28, 2024 •

edited

Loading

eltoder commented Nov 28, 2024

jo3-l commented Nov 28, 2024

jo3-l commented Dec 3, 2024

Add a blacklist for randomCharFromSetCensorStrategy #84

Add a blacklist for randomCharFromSetCensorStrategy #84

Conversation

eltoder commented Nov 28, 2024

codecov bot commented Nov 28, 2024 • edited Loading

Codecov Report

jo3-l left a comment

Choose a reason for hiding this comment

eltoder commented Nov 28, 2024 • edited Loading

jo3-l commented Nov 28, 2024 • edited Loading

eltoder commented Nov 28, 2024

jo3-l commented Nov 28, 2024

jo3-l commented Dec 3, 2024

codecov bot commented Nov 28, 2024 •

edited

Loading

eltoder commented Nov 28, 2024 •

edited

Loading

jo3-l commented Nov 28, 2024 •

edited

Loading