Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize passing going from a Rust std::string::String to a Swift String #5

Open
1 task
chinedufn opened this issue Dec 29, 2021 · 6 comments
Open
1 task

Comments

@chinedufn
Copy link
Owner

chinedufn commented Dec 29, 2021

This issue was born from a discussion on the Rust reddit


Problem

Today, in order to go from a Rust std::string::String to a Swift String we need to:

  1. Box::into_raw(Box::new(string)) the rust std::string::String to get a raw pointer to the String (one allocation)

    swift_bridge::string::RustString(super::some_function()).box_into_raw()

  2. Send the pointer to Swift

  3. Allocate a RustString(ptr: ptr) class instance

    func some_function() -> RustString {
    RustString(ptr: __swift_bridge__$some_function())
    }

  4. Call rustString.toString(), which allocates a swift String

    extension RustString {
    func toString() -> String {
    let str = self.as_str()
    let string = str.toString()
    return string
    }
    }
    extension RustStr {
    func toBufferPointer() -> UnsafeBufferPointer<UInt8> {
    let bytes = UnsafeBufferPointer(start: self.start, count: Int(self.len))
    return bytes
    }
    func toString() -> String {
    let bytes = self.toBufferPointer()
    return String(bytes: bytes, encoding: .utf8)!
    }
    }

  5. The class RustString on the Swift side has a deinit method that calls into Rust to deallocate the Rust std::string::String


When returning a Rust &mut String such as:

extern "Rust" {
    type Foo;
    fn get_mut_string(&self) -> &mut String;
}

we want a class RustStringRefMut on the Swift side that exposes methods to access / mutate the underlying Rust std::string::String.

However, when returning an owned String such as:

extern "Rust" {
    type Foo;
    fn get_owned_string(&self) -> String;
}

there is no reason to have the intermediary class RustString since we don't need Rust to manage the underlying std::string::String.

Instead we want to go directly from Rust std::string::String to Swift String.

Open Questions

This entire issue still needs some more thought on planning... We need to think through all of the implications.

Here are a list of things to think through that we can add to over time:

  • If we called shrink_to_fit on the std::string::String before passing it over to Swift we'd only need the pointer and length in order to de-allocate it later.. If we use a CFAllocatorContext (illustrated in the comment below... func rustDeallocate) in order to de-allocate the std::string::String whenever Swift no longer needed it... we'd have the pointer to the bytes.. but how would we get that len? Or do we need another approach..? Could have a global de-allocator on the Rust side that looked up lengths in something like a HashMap<String pointer, String length> in order to de-allocate Strings... But perhaps there's a simpler/smarter approach to all of this..?
@chinedufn chinedufn changed the title Convert directly from std::string::String to a Swift String Optimize passing going from a Rust std::string::String to a Swift String Dec 29, 2021
@chinedufn
Copy link
Owner Author

chinedufn commented Dec 29, 2021

Here's an example of going from a buffer if utf8 bytes to a Swift String


func testNewStringPassingApproach() throws {
    var string = makeString()
    
    string.withUTF8({buffer in
        print(
"""
String buffer before mutation \(buffer)
"""
        )
    })
    
    
    string += "."
    
    string.withUTF8({buffer in
        print(
"""
String buffer after mutation \(buffer)
"""
        )
    })
    
    print(string)
}


var RustStringDeallocator: CFAllocatorContext = CFAllocatorContext(
    version: 0,
    info: nil,
    retain: nil,
    release: nil,
    copyDescription: nil,
    allocate: nil,
    reallocate: nil,
    deallocate: rustDeallocate,
    preferredSize: nil
)

func rustDeallocate(_ ptr: UnsafeMutableRawPointer?, _ info: UnsafeMutableRawPointer?) {
    print(
        """
Deallocating pointer \(ptr)
"""
    )
}

// https://gist.github.com/martinmroz/5905c65e129d22a1b56d84f08b35a0f4
func makeString() -> String {
    let buffer = UnsafeMutableBufferPointer<UInt8>.allocate(capacity: 11)
    let _ = buffer.initialize(from: "hello world".utf8)
    
    print(
"""
Allocated String buffer \(buffer)
"""
    )
    
    let bytes = buffer.baseAddress!
    let numBytes = buffer.count
    
    let stringDeallocator = CFAllocatorCreate(kCFAllocatorDefault, &RustStringDeallocator)
    
    // https://developer.apple.com/documentation/corefoundation/1543597-cfstringcreatewithbytesnocopy
    let managedCFString = CFStringCreateWithBytesNoCopy(
        kCFAllocatorDefault,
        bytes,
        numBytes,
        CFStringBuiltInEncodings.UTF8.rawValue,
        false,
        // Should be kCFAllocatorNull for &str
        // TODO: take retained or unretained ?
        stringDeallocator?.takeUnretainedValue()
    )
    
    let cfStringPtr = CFStringGetCharactersPtr(managedCFString!)
    print("CFString pointer: \(cfStringPtr)")
    
    var managedString: String = managedCFString! as String
    
    managedString.withUTF8({buffer in
        print(
"""
String initial buffer \(buffer)
"""
        )
    })
    
    return managedString
}

Output

Allocated String buffer UnsafeMutableBufferPointer(start: 0x00007ffaa2c06d00, count: 11)
CFString pointer: nil
String initial buffer UnsafeBufferPointer(start: 0x00007ffee42548d0, count: 11)
Deallocating pointer Optional(0x00007ffaa2c06d00)
String buffer before mutation UnsafeBufferPointer(start: 0x00007ffee4254b80, count: 11)
String buffer after mutation UnsafeBufferPointer(start: 0x00007ffee4254b60, count: 12)
hello world.

It looks like in between going from our raw utf8 byte buffer to our Swift String the bytes seemingly get copied to a new address.

Not sure why.. need to look into whether or not it's possible to remove that copy.

@chinedufn
Copy link
Owner Author

chinedufn commented Dec 31, 2021

Hmm.. it seems like there isn't a zero-copy way to construct a Swift String today https://forums.swift.org/t/does-string-bytesnocopy-copy-bytes/51643/3 .


So.. this all needs some more design thinking.... For example.. if you're reading a 1Gb file into a String and then passing that to Swift, you probably don't want any avoidable copies.


We need to think through cases where we'd want to immediately copy bytes from a Rust std::string::String to a Swift String and when we'd instead want to have an intermediary RustString Swift class. Do we need to be able to annotate functions to indicate how owned Strings should be passed? If so, what would the default be? Things like that need to be answered.

@NiwakaDev
Copy link
Collaborator

NiwakaDev commented Jan 22, 2023

I have a simple suggestion.

When we want rust to have an ownership, annotate such as:

extern "Rust" {
    #[swift_bridge(ownership = "rust")]
    fn get_value()->String;
}
/// Generated Swift code 
func get_value()->RustString{
   //...
}

When we want swift to have an ownership, annotate such as:

extern "Rust" {
    #[swift_bridge(ownership = "swift")]
    fn get_value()->String;
}
/// Generated Swift code 
func get_value()->String{
   //...
}

By default, I think swift should have an ownership because a swift-bridge beginner(such as me) expects to be able to use String directly.

@chinedufn
Copy link
Owner Author

chinedufn commented Jan 23, 2023

When we pass the owned Rust String over to Swift we don't want Rust ownership of the String.

We'd much prefer for it to always immediately become a Swift String once we pass it over.

The only reason that we don't do that is that right now this would involve copying all of the Rust String's bytes to a new Swift String's memory address. We don't want to do any copying without the user explicitly calling .toString().

If Swift had a zero-copy way to construct a Swift String, we would immediately turn the Rust String into a Swift String and delete the RustString type.

Here's how things would ideally work.

#[swift_bridge::bridge]
mod ffi {
    extern "Rust" {
        // On the Rust side when this is called we would immediately
        // `std::mem::forget` the String.
        // !! THIS IS NOT HOW THINGS WORK TODAY !!
        fn make_string() -> String;
    }
}
// Swift

// Generated code
func make_string() -> String {
    // In here the automatically generated code will construct
    // a Swift String that points to the same memory address that
    // the Rust String did.
    // This means that we've created a a Swift String from a Rust String
    // without any copying.
    // !! THIS IS NOT HOW THINGS WORK TODAY !!
}

So, before exploring any annotation based approaches I think we'd want to research whether or not it is, or will ever be possible to construct a Swift String without copying.

If it is or ever will be possible, then that would be a much better approach than annotating functions.

@chinedufn
Copy link
Owner Author

It is unlikely that it will ever be possible to construct a Swift String without copying.

I've documented this in #318

References

"String does not support no-copy initialization" -> https://developer.apple.com/documentation/swift/string/init(bytesnocopy:length:encoding:freewhendone:)

https://forums.swift.org/t/does-string-bytesnocopy-copy-bytes/51643

@chinedufn
Copy link
Owner Author

chinedufn commented Feb 6, 2025

@NiwakaDev I sketched out a potential design that it similar to what you proposed in #5 (comment)

sketch:

/// Ideally, we would bridge Rust's [`String`] to Swift's `String` type directly.
/// We do not do this because there is no zero-copy way to create a Swift `String`, and, since
/// `swift-bridge` aims to be useful in performance sensitive applications, we avoid unnecessary
/// allocations.
///
/// Instead, users that wish to go from a `Rust std::string::String` to a Swift String must call
/// `RustString.toString()` on the Swift side.
/// We can consider introducing annotations that allow a user to opt in to an automatic conversion.
/// For instance, something along the lines of:
/// ```rust,no_run
/// #[swift_bridge::bridge]
/// mod ffi {
/// extern "Rust" {
/// #[swift_bridge(return_clone)]
/// fn return_a_string() -> String;
///
/// #[swift_bridge(return_map_ok_clone)]
/// fn return_a_string_ok() -> Result<String, ()>;
///
/// #[swift_bridge(return_map_err_clone)]
/// fn return_a_string_err() -> Result<(), String>;
/// }
/// }
/// ```
/// When such an attribute was present `swift-bridge` would allocate a Swift String on the Swift
/// side, instead of initializing an instance of the `RustString` class.
///
/// Such an automatic conversion could be made more efficient than using the `RustString.toString()`
/// method to create a Swift String.
/// For instance, to go from `Rust std::string::String -> Swift String` via a `RustString` we:
/// - allocate a `class RustString` instance
/// - call `RustString.toString()`, which constructs a Swift String using the `RustString`'s
/// underlying buffer
///
/// An automatic conversion would look like:
/// - construct a Swift String using the Rust `std::string::String`'s underlying buffer
///
/// Regardless of whether one is using `swift-bridge`, creating instances of Swift reference types
/// requires a small heap allocation.
/// By not creating an instance of the `RustString` class we would be eliminating one small
/// allocation.
///
/// ## References
/// - claim: Impossible to create a Swift `String` without copying:
/// - `init(bytesNoCopy was deprecated in macOS 13` - https://forums.swift.org/t/init-bytesnocopy-was-deprecated-in-macos-13/61231
/// - "String does not support no-copy initialization" - https://developer.apple.com/documentation/swift/string/init(bytesnocopy:length:encoding:freewhendone:)
/// - `Does String(bytesNoCopy:) copy bytes?` - https://forums.swift.org/t/does-string-bytesnocopy-copy-bytes/51643
/// - claim: Class instances allocate
/// - "For example, a class instance (which allocates)" https://www.swift.org/documentation/server/guides/allocations.html#other-perf-tricks


We're also chatting about some String ownership related stuff in #309 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants