Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi delimiter support #4

Merged
merged 2 commits into from
Jun 6, 2018

Conversation

probertson
Copy link

@probertson probertson commented Jun 5, 2018

Add support for multiple delimiters for tokens that are not transformed. This was requested in bunkat#6, and it's something I need too.

Currently it's possible to specify a delimiter or delimiters to use as the start and end marker for a token that shouldn't be transformed. However, in my case there are multiple styles of tokens used in the strings that are being translated. For example:

  • %(variable)s (named sprintf-style aka "Python-style")
  • %s (indexed sprintf-style aka "JavaScript-style")
  • <someTag> and </someTag> (HTML/XML-style start and end tags)
  • <SomeTag/> (HTML/XML-style self-closing tags)

Since the delimiters are treated as part of a RegExp, one workaround is to just include the start and end portions of each token in options.startDelimiter and options.endDelimiter. However, that has a few problems:

  1. it's clunky because it means that the start delimiter of one style of token could be matched with the end delimiter of another style of token, e.g. %(variable/> could theoretically be identified as a token
  2. it's not possible to have unnamed tokens (e.g. %s) because they don't match the "start delimiter + name + end delimiter" requirement.

This PR adds an additional way of specifying multiple delimiters that, when desired, are specifically matched in pairs to avoid problem (1). It also introduces an option (within the multiple delimiters) for a "full" token matcher that defines the full pattern for a token, which solves problem (2).

The PR adds an additional option, delimiters (note the "s"). The options.delimiters property accepts an array of "matcher" objects, each of which defines one style of token that should be excluded from pseudolocalization. For example:

pseudoloc.options.delimiters = [ { ... }, { ... }, { ... } ];

The types of matchers are:

  • { start, end }: specifies a pair of start/end delimiters to match. This is equivalent to using startDelimiter and endDelimiter for each pair
  • { both }: specifies a single marker to use as both the start and end delimiters. This is like using delimiter
  • { full }: specifies a regular expression to use as the pattern for the entire token. This allows for other possibilities that don't work within the constraints of the "delimiter + name + delimiter" or "startDelimiter + name + endDelimiter" structure

@probertson probertson force-pushed the multi-delimiter-support branch from a3f4349 to 2d85820 Compare June 6, 2018 12:04
@probertson
Copy link
Author

I just rebased onto master so the diff is now showing only the changes that are part of this PR (not the ones from #3)

}, []);
return new RegExp(tokenMatchers.join('|'), 'g');
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the core of the change. Now it uses the delimiters array (which by default contains the values from startDelimiter and endDelimiter) to create the token regular expression.

It constructs a regular expression (as a string) for each element of the array, then joins them together with | ("or"). That becomes the full regular expression that is used to identify things to not pseudolocalize.

```
{ full: '%d' }
```

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the documentation for the change

pseudoloc.option.endDelimiter = '\\)[sd]';
pseudoloc.str('A test string with a %(token)s.');
// [!!Á ţȇšŧ śťřīņğ ŵıţħ ą %(token)s.!!]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just some additional explanation about the way the options.delimiter* options work -- it took us a bit of trial and error to figure it out so hopefully this will help others.

Copy link
Owner

@jacalata jacalata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good and very useful, thanks! I'll try and get it packaged to npm in the next couple of days

@jacalata jacalata merged commit 8d377dc into jacalata:master Jun 6, 2018
@probertson
Copy link
Author

probertson commented Jun 7, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants