Skip to content

Commit

Permalink
(#476) C locale is now en_US_POSIX; warn if an explicitly set locale …
Browse files Browse the repository at this point in the history
…ends up with ICU's returning a resource bundle from the root locale
  • Loading branch information
gagolews committed Nov 9, 2023
1 parent 2dac158 commit 8100d47
Show file tree
Hide file tree
Showing 171 changed files with 430 additions and 541 deletions.
2 changes: 1 addition & 1 deletion .devel/sphinx/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ Some environment variables:
## Final Notes
## Getting Help
If you do not manage to set up a successful build, do not
hesitate to [file a bug report](https://github.com/gagolews/stringi/issues).
Expand Down
24 changes: 17 additions & 7 deletions .devel/sphinx/news.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,36 @@
# Changelog


## 1.8.1 (2023-11-08)
## 1.8.1 (2023-11-09)

* [GENERAL] ICU bundle updated to version 74.1 (Unicode 15.1, CLDR 44).

* [BACKWARD INCOMPATIBLE] [BUILD TIME] Support for Solaris has now been dropped.
The package is no longer shipped with the very outdated ICU55 bundle.
* [BACKWARD INCOMPATIBILITY] [BUILD TIME] Support for Solaris has now been
dropped. The package is no longer shipped with the very outdated ICU55 bundle.
A compiler supporting at least C++11 as well as ICU >= 61 are now required.

* [BACKWARD INCOMPATIBLE] #469: Missing date-time fields in
* [BACKWARD INCOMPATIBILITY] #469: Missing date-time fields in
`stri_datetime_parse` and `stri_datetime_create` now default to today's
midnight local time.

* [BACKWARD INCOMPATIBILITY] Removed the long-deprecated and defunct
`fallback_encoding` parameter of `stri_read_lines` and the ellipsis
parameter of `stri_opts_collator`, `stri_opts_regex`, `stri_opts_fixed`,
and `stri_opts_regex`.

* [BUILD TIME] As per the suggestion of Prof. Brian Ripley, `icudt74l`
(ICU data - little endian) is now included in the source tarball (compressed
with xz to save space). This allows for building *stringi* on systems with
no internet access.

* [NEW FEATURE] #476: A warning is emitted when selecting an unknown locale
for collation as it most likely indicates that a wrong resource is being
returned.
* [NEW FEATURE] #476: In break iterator-, date-time-, and collator-based
operations (e.g., `stri_sort`), a warning is emitted when the *root* ICU
resource bundle is returned when using an *explicitly* requested locale.
This might happen when we pass an 'unknown' `locale` argument to these
functions. Note that when relying on the default `locale=NULL` argument,
no warning is emitted. In such a case, checking
if the default locale as returned by `stri_enc_get` is amongst
those listed in `stri_enc_list` is recommended.

* [NEW FEATURE] The `C` locale identifier now resolves to `en_US_POSIX`.

Expand Down
2 changes: 1 addition & 1 deletion .devel/sphinx/rapi/about_locale.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Your program should avoid changing the default locale. All locale-sensitive func

One of many examples of locale-dependent services is the Collator, which performs a locale-aware string comparison. It is used for string comparing, ordering, sorting, and searching. See [`stri_opts_collator`](stri_opts_collator.md) for the description on how to tune its settings, and its `locale` argument in particular.

When choosing a resource bundle that is not available in the requested locale nor in its more general variants (e.g., \'es_ES\' vs \'es\'), a warning is emitted.
When choosing a resource bundle that is not available in the explicitly requested locale (but not when using the default locale) nor in its more general variants (e.g., \'es_ES\' vs \'es\'), a warning is emitted.

Other locale-sensitive functions include, e.g., [`stri_trans_tolower`](stri_trans_casemap.md) (that does character case mapping).

Expand Down
4 changes: 2 additions & 2 deletions .devel/sphinx/rapi/stri_datetime_add.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,15 +68,15 @@ print(x)
```

```
## [1] "2024-01-08 17:00:48 AEDT"
## [1] "2024-01-09 11:24:49 AEDT"
```

```r
stri_datetime_add(x, -2, units='months')
```

```
## [1] "2023-11-08 17:00:48 AEDT"
## [1] "2023-11-09 11:24:49 AEDT"
```

```r
Expand Down
2 changes: 1 addition & 1 deletion .devel/sphinx/rapi/stri_datetime_create.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,5 +96,5 @@ stri_datetime_create(hour=15, minute=59)
```

```
## [1] "2023-11-08 15:59:00 AEDT"
## [1] "2023-11-09 15:59:00 AEDT"
```
8 changes: 4 additions & 4 deletions .devel/sphinx/rapi/stri_datetime_fields.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,9 +77,9 @@ stri_datetime_fields(stri_datetime_now())

```
## Year Month Day Hour Minute Second Millisecond WeekOfYear WeekOfMonth
## 1 2023 11 8 17 0 49 139 46 2
## 1 2023 11 9 11 24 49 982 46 2
## DayOfYear DayOfWeek Hour12 AmPm Era
## 1 312 4 5 2 2
## 1 313 5 11 1 2
```

```r
Expand All @@ -88,9 +88,9 @@ stri_datetime_fields(stri_datetime_now(), locale='@calendar=hebrew')

```
## Year Month Day Hour Minute Second Millisecond WeekOfYear WeekOfMonth
## 1 5784 2 24 17 0 49 148 9 4
## 1 5784 2 25 11 24 49 986 9 4
## DayOfYear DayOfWeek Hour12 AmPm Era
## 1 54 4 5 2 1
## 1 55 5 11 1 1
```

```r
Expand Down
2 changes: 1 addition & 1 deletion .devel/sphinx/rapi/stri_datetime_format.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,5 +221,5 @@ stri_datetime_format(stri_datetime_now(), 'datetime_relative_medium')
```

```
## [1] "today, 5:00:49 pm"
## [1] "today, 11:24:50 am"
```
8 changes: 4 additions & 4 deletions .devel/sphinx/rapi/stri_enc_detect2.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@ stri_enc_detect2(str, locale = NULL)

## Arguments

| | |
|----------|-------------------------------------------------------------------------------------------------------------------------|
| `str` | character vector, a raw vector, or a list of `raw` vectors |
| `locale` | `NULL` or `''` for default locale, `NA` for just checking the UTF-\* family, or a single string with locale identifier. |
| | |
|----------|-----------------------------------------------------------------------------------|
| `str` | character vector, a raw vector, or a list of `raw` vectors |
| `locale` | `NULL` or `''` for the default locale, or a single string with locale identifier. |

## Details

Expand Down
4 changes: 1 addition & 3 deletions .devel/sphinx/rapi/stri_opts_brkiter.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,7 @@ stri_opts_brkiter(
skip_line_soft,
skip_line_hard,
skip_sentence_term,
skip_sentence_sep,
...
skip_sentence_sep
)
```

Expand All @@ -38,7 +37,6 @@ stri_opts_brkiter(
| `skip_line_hard` | logical; perform no action for hard, or mandatory line breaks |
| `skip_sentence_term` | logical; perform no action for sentences ending with a sentence terminator (\'`.`\', \'`,`\', \'`?`\', \'`!`\'), possibly followed by a hard separator (`CR`, `LF`, `PS`, etc.) |
| `skip_sentence_sep` | logical; perform no action for sentences that do not contain an ending sentence terminator, but are ended by a hard separator or end of input |
| `...` | \[DEPRECATED\] any other arguments passed to this function generate a warning; this argument will be removed in the future |

## Details

Expand Down
7 changes: 2 additions & 5 deletions .devel/sphinx/rapi/stri_opts_collator.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,7 @@ stri_opts_collator(
case_level = FALSE,
normalization = FALSE,
normalisation = normalization,
numeric = FALSE,
...
numeric = FALSE
)

stri_coll(
Expand All @@ -29,8 +28,7 @@ stri_coll(
case_level = FALSE,
normalization = FALSE,
normalisation = normalization,
numeric = FALSE,
...
numeric = FALSE
)
```

Expand All @@ -47,7 +45,6 @@ stri_coll(
| `normalization` | single logical value; if `TRUE`, then incremental check is performed to see whether the input data is in the FCD form. If the data is not in the FCD form, incremental NFD normalization is performed |
| `normalisation` | alias of `normalization` |
| `numeric` | single logical value; when turned on, this attribute generates a collation key for the numeric value of substrings of digits; this is a way to get \'100\' to sort AFTER \'2\'; note that negative or non-integer numbers will not be ordered properly |
| `...` | \[DEPRECATED\] any other arguments passed to this function generate a warning; this argument will be removed in the future |

## Details

Expand Down
11 changes: 5 additions & 6 deletions .devel/sphinx/rapi/stri_opts_fixed.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,15 @@ A convenience function used to tune up the behavior of `stri_*_fixed` functions,
## Usage

``` r
stri_opts_fixed(case_insensitive = FALSE, overlap = FALSE, ...)
stri_opts_fixed(case_insensitive = FALSE, overlap = FALSE)
```

## Arguments

| | |
|--------------------|----------------------------------------------------------------------------------------------------------------------------|
| `case_insensitive` | logical; enable simple case insensitive matching |
| `overlap` | logical; enable overlapping matches\' detection |
| `...` | \[DEPRECATED\] any other arguments passed to this function generate a warning; this argument will be removed in the future |
| | |
|--------------------|--------------------------------------------------|
| `case_insensitive` | logical; enable simple case insensitive matching |
| `overlap` | logical; enable overlapping matches\' detection |

## Details

Expand Down
4 changes: 1 addition & 3 deletions .devel/sphinx/rapi/stri_opts_regex.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,7 @@ stri_opts_regex(
uword,
error_on_unknown_escapes,
time_limit = 0L,
stack_limit = 0L,
...
stack_limit = 0L
)
```

Expand All @@ -40,7 +39,6 @@ stri_opts_regex(
| `error_on_unknown_escapes` | logical; whether to generate an error on unrecognized backslash escapes; if set, fail with an error on patterns that contain backslash-escaped ASCII letters without a known special meaning; otherwise, these escaped letters represent themselves |
| `time_limit` | integer; processing time limit, in \~milliseconds (but not precisely so, depends on the CPU speed), for match operations; setting a limit is desirable if poorly written regexes are expected on input; 0 for no limit |
| `stack_limit` | integer; maximal size, in bytes, of the heap storage available for the match backtracking stack; setting a limit is desirable if poorly written regexes are expected on input; 0 for no limit |
| `...` | \[DEPRECATED\] any other arguments passed to this function generate a warning; this argument will be removed in the future |

## Details

Expand Down
2 changes: 1 addition & 1 deletion .devel/sphinx/rapi/stri_rand_lipsum.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ stri_rand_lipsum(n_paragraphs, start_lipsum = TRUE, nparagraphs = n_paragraphs)
|----------------|------------------------------------------------------------------------------------------|
| `n_paragraphs` | single integer, number of paragraphs to generate |
| `start_lipsum` | single logical value; should the resulting text start with *Lorem ipsum dolor sit amet*? |
| `nparagraphs` | deprecated alias of `n_paragraphs` |
| `nparagraphs` | \[DEPRECATED\] alias of `n_paragraphs` |

## Details

Expand Down
11 changes: 5 additions & 6 deletions .devel/sphinx/rapi/stri_read_lines.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,11 @@ stri_read_lines(con, encoding = NULL, fname = con, fallback_encoding = NULL)

## Arguments

| | |
|---------------------|---------------------------------------------------------------------------------|
| `con` | name of the output file or a connection object (opened in the binary mode) |
| `encoding` | single string; input encoding; `NULL` or `''` for the current default encoding. |
| `fname` | deprecated alias of `con` |
| `fallback_encoding` | deprecated argument, no longer used |
| | |
|------------|---------------------------------------------------------------------------------|
| `con` | name of the output file or a connection object (opened in the binary mode) |
| `encoding` | single string; input encoding; `NULL` or `''` for the current default encoding. |
| `fname` | \[DEPRECATED\] alias of `con` |

## Details

Expand Down
2 changes: 1 addition & 1 deletion .devel/sphinx/rapi/stri_read_raw.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ stri_read_raw(con, fname = con)
| | |
|---------|----------------------------------------------------------------------------|
| `con` | name of the output file or a connection object (opened in the binary mode) |
| `fname` | deprecated alias of `con` |
| `fname` | \[DEPRECATED\] alias of `con` |

## Details

Expand Down
4 changes: 2 additions & 2 deletions .devel/sphinx/rapi/stri_sprintf.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ stri_sprintf("UNIX time %1$f is %1$s.", Sys.time())
```

```
## [1] "UNIX time 1699423258.691701 is 2023-11-08 17:00:58.691701."
## [1] "UNIX time 1699489499.508911 is 2023-11-09 11:24:59.508911."
```

```r
Expand All @@ -213,7 +213,7 @@ stri_sprintf("%1$s is %1$f UNIX time.", Sys.time()) # re-coercion needed
```

```
## [1] "2023-11-08 17:00:58.69345 is 1699423258.693450 UNIX time."
## [1] "2023-11-09 11:24:59.510592 is 1699489499.510592 UNIX time."
```

```r
Expand Down
2 changes: 1 addition & 1 deletion .devel/sphinx/rapi/stri_write_lines.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ stri_write_lines(
| `con` | name of the output file or a connection object (opened in the binary mode) |
| `encoding` | output encoding, `NULL` or `''` for the current default one |
| `sep` | newline separator |
| `fname` | deprecated alias of `con` |
| `fname` | \[DEPRECATED\] alias of `con` |

## Details

Expand Down
2 changes: 1 addition & 1 deletion .devel/tinytest/test-count-coll.R
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ expect_equivalent(stri_count_coll("bababababaab", "aab", opts_collator = stri_op
1L)

old_loc <- stri_locale_set("UNKNOWN")
expect_warning(stri_count_coll("bababababaab", "aab"))
expect_equivalent(stri_count_coll("bababababaab", "aab"), 1L)
stri_locale_set(old_loc)

expect_equivalent(stri_count_coll("bababababaab", "aab", opts_collator = stri_opts_collator(locale = "C")),
Expand Down
Loading

0 comments on commit 8100d47

Please sign in to comment.