Multi delimiter support #4

probertson · 2018-06-05T15:38:32Z

Add support for multiple delimiters for tokens that are not transformed. This was requested in bunkat#6, and it's something I need too.

Currently it's possible to specify a delimiter or delimiters to use as the start and end marker for a token that shouldn't be transformed. However, in my case there are multiple styles of tokens used in the strings that are being translated. For example:

%(variable)s (named sprintf-style aka "Python-style")
%s (indexed sprintf-style aka "JavaScript-style")
<someTag> and </someTag> (HTML/XML-style start and end tags)
<SomeTag/> (HTML/XML-style self-closing tags)

Since the delimiters are treated as part of a RegExp, one workaround is to just include the start and end portions of each token in options.startDelimiter and options.endDelimiter. However, that has a few problems:

it's clunky because it means that the start delimiter of one style of token could be matched with the end delimiter of another style of token, e.g. %(variable/> could theoretically be identified as a token
it's not possible to have unnamed tokens (e.g. %s) because they don't match the "start delimiter + name + end delimiter" requirement.

This PR adds an additional way of specifying multiple delimiters that, when desired, are specifically matched in pairs to avoid problem (1). It also introduces an option (within the multiple delimiters) for a "full" token matcher that defines the full pattern for a token, which solves problem (2).

The PR adds an additional option, delimiters (note the "s"). The options.delimiters property accepts an array of "matcher" objects, each of which defines one style of token that should be excluded from pseudolocalization. For example:

pseudoloc.options.delimiters = [ { ... }, { ... }, { ... } ];

The types of matchers are:

{ start, end }: specifies a pair of start/end delimiters to match. This is equivalent to using startDelimiter and endDelimiter for each pair
{ both }: specifies a single marker to use as both the start and end delimiters. This is like using delimiter
{ full }: specifies a regular expression to use as the pattern for the entire token. This allows for other possibilities that don't work within the constraints of the "delimiter + name + delimiter" or "startDelimiter + name + endDelimiter" structure

probertson · 2018-06-06T12:05:05Z

I just rebased onto master so the diff is now showing only the changes that are part of this PR (not the ones from #3)

probertson · 2018-06-06T12:11:03Z

src/core/str.js

+    }, []);
+    return new RegExp(tokenMatchers.join('|'), 'g');
+  }
+


This is the core of the change. Now it uses the delimiters array (which by default contains the values from startDelimiter and endDelimiter) to create the token regular expression.

It constructs a regular expression (as a string) for each element of the array, then joins them together with | ("or"). That becomes the full regular expression that is used to identify things to not pseudolocalize.

probertson · 2018-06-06T12:12:35Z

Readme.md

+    ```
+    { full: '%d' }
+    ```
+


This is the documentation for the change

probertson · 2018-06-06T12:13:45Z

Readme.md

+    pseudoloc.option.endDelimiter = '\\)[sd]';
+    pseudoloc.str('A test string with a %(token)s.');
+    // [!!Á ţȇšŧ śťřīņğ ŵıţħ ą %(token)s.!!]
+


This is just some additional explanation about the way the options.delimiter* options work -- it took us a bit of trial and error to figure it out so hopefully this will help others.

jacalata

looks good and very useful, thanks! I'll try and get it packaged to npm in the next couple of days

probertson · 2018-06-07T00:35:11Z

Sounds great. Thanks!

…

On Wed, Jun 6, 2018, 4:47 PM Jac ***@***.***> wrote: ***@***.**** approved this pull request. looks good and very useful, thanks! I'll try and get it packaged to npm in the next couple of days — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#4 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAIJSkJg3kL6k121p_ITNjblDyUO58DUks5t6E39gaJpZM4UbEpB> .

probertson-hv added 2 commits June 1, 2018 20:25

Add support for specifying multiple delimiter pairs (or singles)

db2e941

Add documentation for delimiters option

2d85820

probertson force-pushed the multi-delimiter-support branch from a3f4349 to 2d85820 Compare June 6, 2018 12:04

probertson commented Jun 6, 2018

View reviewed changes

Readme.md

```

{ full: '%d' }

```

Copy link

Author

probertson Jun 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the documentation for the change

probertson commented Jun 6, 2018

View reviewed changes

jacalata approved these changes Jun 6, 2018

View reviewed changes

jacalata merged commit 8d377dc into jacalata:master Jun 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi delimiter support #4

Multi delimiter support #4

probertson commented Jun 5, 2018 •

edited

Loading

probertson commented Jun 6, 2018

probertson Jun 6, 2018

probertson Jun 6, 2018

probertson Jun 6, 2018

jacalata left a comment

probertson commented Jun 7, 2018 via email

Multi delimiter support #4

Multi delimiter support #4

Conversation

probertson commented Jun 5, 2018 • edited Loading

probertson commented Jun 6, 2018

probertson Jun 6, 2018

Choose a reason for hiding this comment

probertson Jun 6, 2018

Choose a reason for hiding this comment

probertson Jun 6, 2018

Choose a reason for hiding this comment

jacalata left a comment

Choose a reason for hiding this comment

probertson commented Jun 7, 2018 via email

probertson commented Jun 5, 2018 •

edited

Loading