Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KBString for byte strings? #37

Open
joshtriplett opened this issue Jun 18, 2022 · 3 comments
Open

KBString for byte strings? #37

joshtriplett opened this issue Jun 18, 2022 · 3 comments
Labels
enhancement Improve the expected

Comments

@joshtriplett
Copy link

I'd love to have a version of KString that can hold arbitrary bytes, rather than UTF-8. I currently use BString, but I'd love to have the compact string optimization of KString.

@epage epage added the enhancement Improve the expected label Jun 19, 2022
@epage
Copy link
Member

epage commented Jun 19, 2022

Are you looking for just a compact Vec<u8> and willing to pull in bstr for ByteSlice trait or are you looking for us to implement all of the relevant string methods?

If you are fine with ByteSlice then this would be relatively trivial to get in and I've been considering byte/OsStr versions.

@Byron
Copy link

Byron commented Feb 27, 2023

This enhancement would be very relevant for gix-attributes, which has short keys (attribute names) that fit into UTF-8, and values which can take most values and thus are better represented as byte strings (like bstr). For now I will go with String/bstr::BString here and leave everything else to future profiler driven optimizations.

@Byron
Copy link

Byron commented Apr 13, 2023

I ended up doing something like this:

/// A reference container to encapsulate a tightly packed and typically unallocated byte value that isn't necessarily UTF8 encoded.
#[derive(PartialEq, Eq, Debug, Hash, Ord, PartialOrd, Clone, Copy)]
pub struct ValueRef<'a>(KStringRef<'a>);

/// Conversions
impl<'a> ValueRef<'a> {
    /// Keep `input` as our value.
    pub fn from_bytes(input: &'a [u8]) -> Self {
        Self(KStringRef::from_ref(
            // SAFETY: our API makes accessing that value as `str` impossible, so illformed UTF8 is never exposed as such.
            #[allow(unsafe_code)]
            unsafe {
                std::str::from_utf8_unchecked(input)
            },
        ))
    }

    /// Access this value as byte string.
    pub fn as_bstr(&self) -> &BStr {
        self.0.as_bytes().as_bstr()
    }
}

I am writing this here just to assure myself that KBString/KBStringRef don't rely on valid UTF8 for inline storage. Tests indicate that everything works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improve the expected
Projects
None yet
Development

No branches or pull requests

3 participants