- Types: StrSlice for strings, Slice for bytes, StrChar for characters
- Gas efficient
- Versioned releases, available for both foundry and hardhat
- Simple imports, you only need e.g.
StrSlice
andtoSlice
StrSlice
enforces UTF-8 character boundaries;StrChar
validates character encoding- Clean, well-documented and thoroughly-tested source code
- Optional PRBTest extension with assertions like
assertContains
andassertLt
for both slices and nativebytes
,string
Slice
andStrSlice
are value types, not structs- Low-level functions like memchr, memcmp, memmove etc
yarn add @dk1a/solidity-stringutils
forge install --no-commit dk1a/solidity-stringutils
import { StrSlice, toSlice } from "@dk1a/solidity-stringutils/src/StrSlice.sol";
using { toSlice } for string;
/// @dev Returns the content of brackets, or empty string if not found
function extractFromBrackets(string memory stuffInBrackets) pure returns (StrSlice extracted) {
StrSlice s = stuffInBrackets.toSlice();
bool found;
(found, , s) = s.splitOnce(toSlice("("));
if (!found) return toSlice("");
(found, s, ) = s.rsplitOnce(toSlice(")"));
if (!found) return toSlice("");
return s;
}
/*
assertEq(
extractFromBrackets("((1 + 2) + 3) + 4"),
toSlice("(1 + 2) + 3")
);
*/
See ExamplesTest.
Internally StrSlice
uses Slice
and extends it with logic for multibyte UTF-8 where necessary.
Method | Description |
---|---|
len |
length in bytes |
isEmpty |
true if len == 0 |
toString |
copy slice contents to a new string |
keccak |
equal to keccak256(s.toString()) , but cheaper |
concatenate | |
add |
Concatenate 2 slices into a new string |
join |
Join slice array on self as separator |
compare | |
cmp |
0 for eq, < 0 for lt, > 0 for gt |
eq ,ne |
==, != (more efficient than cmp) |
lt ,lte |
<, <= |
gt ,gte |
>, >= |
index | |
isCharBoundary |
true if given index is an allowed boundary |
get |
get 1 UTF-8 character at given index |
splitAt |
(slice[:index], slice[index:]) |
getSubslice |
slice[start:end] |
search | |
find |
index of the start of the first match |
rfind |
index of the start of the last match |
return type(uint256).max for no matches |
|
contains |
true if a match is found |
startsWith |
true if starts with pattern |
endsWith |
true if ends with pattern |
modify | |
stripPrefix |
returns subslice without the prefix |
stripSuffix |
returns subslice without the suffix |
splitOnce |
split into 2 subslices on the first match |
rsplitOnce |
split into 2 subslices on the last match |
replacen |
experimental replace n matches |
replacen requires 0 < pattern.len() <= to.len() | |
iterate | |
chars |
character iterator over the slice |
ascii | |
isAscii |
true if all chars are ASCII |
dangerous | |
asSlice |
get underlying Slice |
ptr |
get memory pointer |
Indexes are in bytes, not characters. Indexing methods revert if isCharBoundary
is false.
Returned by chars
method of StrSlice
import { StrSlice, toSlice, StrCharsIter } from "@dk1a/solidity-stringutils/src/StrSlice.sol";
using { toSlice } for string;
/// @dev Returns a StrSlice of `str` with the 2 first UTF-8 characters removed
/// reverts on invalid UTF8
function removeFirstTwoChars(string memory str) pure returns (StrSlice) {
StrCharsIter memory chars = str.toSlice().chars();
for (uint256 i; i < 2; i++) {
if (chars.isEmpty()) break;
chars.next();
}
return chars.asStr();
}
/*
assertEq(removeFirstTwoChars(unicode"📎!こんにちは"), unicode"こんにちは");
*/
Method | Description |
---|---|
asStr |
get underlying StrSlice of the remainder |
len |
remainder length in bytes |
isEmpty |
true if len == 0 |
next |
advance the iterator, return the next StrChar |
nextBack |
advance from the back, return the next StrChar |
count |
returns the number of UTF-8 characters |
validateUtf8 |
returns true if the sequence is valid UTF-8 |
dangerous | |
unsafeNext |
advance unsafely, return the next StrChar |
unsafeCount |
unsafely count chars, read the source for caveats |
ptr |
get memory pointer |
count
, validateUtf8
, unsafeCount
consume the iterator in O(n).
Safe methods revert on an invalid UTF-8 byte sequence.
unsafeNext
does NOT check if the iterator is empty, may underflow! Does not revert on invalid UTF-8. If returned StrChar
is invalid, it will have length 0. Otherwise length 1-4.
Internally next
, unsafeNext
, count
all use _nextRaw
. It's very efficient, but very unsafe and complicated. Read the source and import it separately if you need it.
Represents a single UTF-8 encoded character. Internally it's bytes32 with leading byte at MSB.
It's returned by some methods of StrSlice
and StrCharsIter
.
Method | Description |
---|---|
len |
character length in bytes |
toBytes32 |
returns the underlying bytes32 value |
toString |
copy the character to a new string |
toCodePoint |
returns the unicode code point (ord in python) |
cmp |
0 for eq, < 0 for lt, > 0 for gt |
eq ,ne |
==, != |
lt ,lte |
<, <= |
gt ,gte |
>, >= |
isValidUtf8 |
usually true |
isAscii |
true if the char is ASCII |
Import StrChar__
(static function lib) to use StrChar__.fromCodePoint
for code point to StrChar
conversion.
len
can return 0
only for invalid UTF-8 characters. But some invalid chars may have non-zero len! (use isValidUtf8
to check validity). Note that 0x00
is a valid 1-byte UTF-8 character, its len is 1.
isValidUtf8
can be false if the character was formed with an unsafe method (fromUnchecked, wrap).
import { Slice, toSlice } from "@dk1a/solidity-stringutils/src/Slice.sol";
using { toSlice } for bytes;
function findZeroByte(bytes memory b) pure returns (uint256 index) {
return b.toSlice().find(
bytes(hex"00").toSlice()
);
}
See using {...} for Slice global
in the source for a function summary. Many are shared between Slice
and StrSlice
, but there are differences.
Internally Slice has very minimal assembly, instead using memcpy
, memchr
, memcmp
and others; if you need the low-level functions, see src/utils/
.
import { PRBTest } from "@prb/test/src/PRBTest.sol";
import { Assertions } from "@dk1a/solidity-stringutils/src/test/Assertions.sol";
contract StrSliceTest is PRBTest, Assertions {
function testContains() public {
bytes memory b1 = "12345";
bytes memory b2 = "3";
assertContains(b1, b2);
}
function testLt() public {
string memory s1 = "123";
string memory s2 = "124";
assertLt(s1, s2);
}
}
You can completely ignore slices if all you want is e.g. assertContains
for native bytes
/string
.
- Arachnid/solidity-stringutils - I basically wanted to make an updated version of solidity-stringutils
- rust - most similarities are in names and general structure; the implementation can't really be similar (solidity doesn't even have generics)
- paulrberg/prb-math - good template for solidity data structure libraries with
using {...} for ... global
- brockelmore/memmove - good assembly memory management examples