Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undesired link_name attributes generated #1221

Closed
waywardmonkeys opened this issue Jan 12, 2018 · 24 comments
Closed

Undesired link_name attributes generated #1221

waywardmonkeys opened this issue Jan 12, 2018 · 24 comments

Comments

@waywardmonkeys
Copy link
Contributor

Input C/C++ Header

int foo(void);

Bindgen Invocation

$ bindgen input.h

Actual Results

/* automatically generated by rust-bindgen */

extern "C" {
    #[link_name = "\u{1}_foo"]
    pub fn foo() -> ::std::os::raw::c_int;
}

I would like the output to not contain the link_name attribute as it shouldn't be necessary here.

I'm on macOS, so all symbols get the _ prefix ...

@emilio
Copy link
Contributor

emilio commented Jan 22, 2018

Right, so this is somewhat annoying, because fixing this would need to assume knowledge about the backend mangling on the target. The only reason those \u{1} are there is to avoid that...

Indeed we used to have a fair amount of bugs with C++ related to that.

So as much as this is only cosmetic, I don't think there's any particular reason to fix this, is there?

@emilio
Copy link
Contributor

emilio commented Jan 22, 2018

See all the need_mangling_hack mess that went away in f2b30c8, fwiw.

I think we could consider doing this as a best-effort thing, like: If we really know that the backend needs the extra underscore, just remove it, otherwise fall back to \u{1}, but again not sure it's worth the complexity.

@waywardmonkeys
Copy link
Contributor Author

In my case, I typically build a bindings file once and commit it, rather than building it on the target machine. (This is the case for a lot of bindings...)

Also in my case, my bindings are for C APIs rather than C++ ... I understand the need for this for C++ APIs, but what is the need for C?

@emilio
Copy link
Contributor

emilio commented Feb 3, 2018

Sorry for the lag replying to this.

You can use the distrust_clang_mangling option to avoid generating any link_name at all, and C bindings should keep working.

If there's anything else we can do on our side (maybe renaming that option? We've done more breaking changes for the next release so it should be fine) please let me know, or submit a PR :)

@emilio emilio closed this as completed Feb 3, 2018
@jethrogb
Copy link
Contributor

jethrogb commented Nov 28, 2018

I'm still getting the \u{1} even with trust_clang_mangling(false) (in combination with #1438)

    #[link_name = "\u{1}mbedtls_mpi_div_int"]
    pub fn mpi_div_int(Q: *mut mpi, R: *mut mpi, A: *const mpi, b: mpi_sint) -> raw_types::c_int;

@emilio
Copy link
Contributor

emilio commented Nov 28, 2018

It's expected that you get that if the symbol name is different from the name you're exposing. The trust-clang-mangling thing makes us avoid looking at the mangled name and assume it's the function name, but then you're changing that so we fix up the link name.

Do you need the link name to go away? If so, why? How would that link?

@jethrogb
Copy link
Contributor

jethrogb commented Nov 28, 2018

Right, link_name is definitely necessary. I expect it to be #[link_name = "mbedtls_mpi_div_int"] instead (the original symbol name before renaming it).

@emilio
Copy link
Contributor

emilio commented Nov 28, 2018

That wouldn't work on MacOS where we'd need an underscore. So the \u{1} is the right thing to generate.

@jethrogb
Copy link
Contributor

I'm sorry, I don't quite understand. You're saying #[link_name = "\u{1}mbedtls_mpi_div_int"] is correct? That is obviously not what the symbol name is in my .a. I tried googling for “\u{1} link_name” but the top results are all random issues in this issue tracker 😃

@emilio
Copy link
Contributor

emilio commented Nov 29, 2018

Yes, that's right, see f2b30c8. Basically rustc passes link_name arguments as-is to LLVM, and LLVM adds per-backend mangling specially for C++ and such.

Prepending \u{1} is a hint to LLVM to avoid doing this (since we know actually the right mangling). It should end up being the right symbol name, though if you have evidence of the opposite let's file a new issue.

@awakecoding
Copy link

I ran into the same issue today with rust-mbedtls + pregenerating the bindings to be reused on Windows/macOS/Linux. As a temporary workaround, I did a search & replace in the generated bindings to remove the \u{1} prefix and it worked.

@jethrogb
Copy link
Contributor

jethrogb commented Mar 7, 2019

@awakecoding as I understand, it's not an issue, and you can leave the \u{1} and everything will work as normal.

@awakecoding
Copy link

@jethrogb I found this issue because leaving \u{1} caused issues for me on macOS. I modified rust-mbedtls to link against a prebuilt mbedtls library provided by conan, but any static library would cause the same problem. Removing \u{1} made it work for Windows/macOS/Linux.

@emilio
Copy link
Contributor

emilio commented Mar 10, 2019

@awakecoding what's your clang version? Some old libclangs returned bogus mangled names.

In any case, this should not be an issue if you run bindgen once per platform (at build time or such).

If you know you don't have or use non-C symbols, you can use the distrust-clang-mangling option to never look at clang-mangled names.

@awakecoding
Copy link

@emilio I think it was libclang 6.0.0 on Linux. However, in my case, my intent is to generate the bindings once for all platforms, since the mbedtls API is identical on all platforms (at least if you build it with exactly the same options everywhere). I will try distrust-clang-mangling.

@Tastaturtaste
Copy link

Sorry to comment on an old issue, but it seems to describe my issue exactly. Currently I am working on a crate for bindings to Matlab. I distribute the generated bindings with the crate instead of letting the user generate them. This seems to break on MacOS because of the prefixed \u{1} in the link_name, as removing them seems to fix the issue as this pull-request indicates: Tastaturtaste/matlab-sys#2

Is there any option to generate the link_name without the \u{1} prefix? I don't quite understand why the provided name cannot be the same as the function name with the correct symbol name. Could someone point me to an explanation or explain it to me here?

@pvdrz
Copy link
Contributor

pvdrz commented Mar 9, 2023

I recently had an issue with mangling on macos and it was the fact that it added _ as a prefix to every symbol. Could it be that this \u{1} is forcing llvm to not mangle names and thus, the bindings are missing this _ and the matlab library still has them?

I don't have a macos to test but it would be interesting to see if prepending _ using callbacks fixes something or not.

@Tastaturtaste
Copy link

Sadly I also don't have MacOS to test this. Currently I try to get CI with Github Actions up to test on MacOS.

The thing which confuses me is, why does the link_name attribute have to be different than the function name would be without an attribute? Bindgen only adds those attributes in my case because I use a ParserCallback to override the exposed names. If I wouldn't do that there would be no link_name attribute and the function name would not start with a \u{1}.

@pvdrz
Copy link
Contributor

pvdrz commented Mar 9, 2023

the thing is that we already ask clang to mangle the names. So this \u{1} prevents LLVM from mangling the already mangled name again

@Tastaturtaste
Copy link

the thing is that we already ask clang to mangle the names. So this \u{1} prevents LLVM from mangling the already mangled name again

So, as far as I understand now, when I write
extern "C" fn foo();
in a crate and compile it on different platforms, foo would get mangled differently on different platforms. For example on Mac with an underscore in front: _foo.
When I instead write

#[link_name="\u{1}foo"]
extern "C" fn bar();

bar would always get mangled exactly to foo independent of the platform.
Is that correct? I will continue with the assumption that it is, if it is not please correct me.

My problem with this is now that I do want the platform dependent mangling while using ParseCallback::generated_name_override to expose another name for the consumer of my crate. But as far as I know there is no way to prevent bindgen from adding this \u{1} in front of the link_name attribute. I don't want bindgen to apply any mangling to the name. I also cannot use Builder::trust_clang_mangling(false) together with ParseCallback::generated_name_override because that completely prevents the addition of the link_name attribute. I think a workaround for my use case could be something like this:

pub use private::bar as foo;
mod private{
    extern "C" fn bar();
}

But since bindgen, as far as I know, cannot generate this code for me I have to resort to do at least the pub use as manually or write my own script for it. I would really like a method offered by bindgen to give me the desired effect automatically.

@pvdrz
Copy link
Contributor

pvdrz commented Mar 14, 2023

the thing is that we already ask clang to mangle the names. So this \u{1} prevents LLVM from mangling the already mangled name again

So, as far as I understand now, when I write extern "C" fn foo(); in a crate and compile it on different platforms, foo would get mangled differently on different platforms. For example on Mac with an underscore in front: _foo. When I instead write

#[link_name="\u{1}foo"]
extern "C" fn bar();

bar would always get mangled exactly to foo independent of the platform. Is that correct? I will continue with the assumption that it is, if it is not please correct me.

Well yeah. To be more preceise \u{1} prevents the name from being mangled at all. So the name won't be changed by LLVM in any way and stay foo.

My problem with this is now that I do want the platform dependent mangling while using ParseCallback::generated_name_override to expose another name for the consumer of my crate. But as far as I know there is no way to prevent bindgen from adding this \u{1} in front of the link_name attribute. I don't want bindgen to apply any mangling to the name.

Well the issue is that bindgen just "asks" to clang what the mangled name will be and prepends \u{1} to the already mangled name. What I don't see is why this "platform dependent mangling" is not happening on its own already as bindgen should generate correct macos names because it asks to the clang compiler of the platform what the mangled name should be.

I also cannot use Builder::trust_clang_mangling(false) together with ParseCallback::generated_name_override because that completely prevents the addition of the link_name attribute.

Yeah this \u{1} is added unconditionally to the link name. I could see adding an option to remove it but I'd rather figure out why the names aren't being mangled correctly on the first place.

@Tastaturtaste
Copy link

What I don't see is why this "platform dependent mangling" is not happening on its own already as bindgen should generate correct macos names because it asks to the clang compiler of the platform what the mangled name should be.

That doesn't surprise me at all, as I generate the bindings once, currently on windows, and distribute them so users don't have to install LLVM themselves. I know the function names stay the same between all platforms and I can rely on the mangling done by rustc to generate the correct symbol name.
I believe @awakecoding mentioned in a comment above that he also wanted to do something similar. He resorted to manually removing the prefix I think:

I ran into the same issue today with rust-mbedtls + pregenerating the bindings to be reused on Windows/macOS/Linux. As a temporary workaround, I did a search & replace in the generated bindings to remove the \u{1} prefix and it worked.

@pvdrz
Copy link
Contributor

pvdrz commented Mar 14, 2023

What I don't see is why this "platform dependent mangling" is not happening on its own already as bindgen should generate correct macos names because it asks to the clang compiler of the platform what the mangled name should be.

That doesn't surprise me at all, as I generate the bindings once, currently on windows, and distribute them so users don't have to install LLVM themselves. I know the function names stay the same between all platforms and I can rely on the mangling done by rustc to generate the correct symbol name. I believe @awakecoding mentioned in a comment above that he also wanted to do something similar. He resorted to manually removing the prefix I think:

I ran into the same issue today with rust-mbedtls + pregenerating the bindings to be reused on Windows/macOS/Linux. As a temporary workaround, I did a search & replace in the generated bindings to remove the \u{1} prefix and it worked.

So, in principle you could run bindgen with each --target and use conditional compilation to expose the right binding to the right target? Right?

@Tastaturtaste
Copy link

Tastaturtaste commented Mar 14, 2023

So, in principle you could run bindgen with each --target and use conditional compilation to expose the right binding to the right target? Right?

I guess in principle that would work, but since I do have to do some cleanup after the code generation that would triple this work. Also it seems somewhat roundabout to codegen for three different targets only to get the mangling correct when that mangling would happen anyway with normal compilation.

DrTobe added a commit to embedded-rust-iml/rust-mbedtls that referenced this issue May 30, 2023
According to the discussion in
rust-lang/rust-bindgen#1221 (comment),
the \u{1} is a hint to LLVM to avoid name-mangling but this might be required
on some platforms to link against the right name, e.g. on macOS.
DrTobe added a commit to embedded-rust-iml/rust-mbedtls that referenced this issue Jul 11, 2023
According to the discussion in
rust-lang/rust-bindgen#1221 (comment),
the \u{1} is a hint to LLVM to avoid name-mangling but this might be required
on some platforms to link against the right name, e.g. on macOS.
DrTobe added a commit to embedded-rust-iml/rust-mbedtls that referenced this issue Jul 12, 2023
…ssl_flush_output functions

According to the discussion in
rust-lang/rust-bindgen#1221 (comment),
the \u{1} is a hint to LLVM to avoid name-mangling but this might be required
on some platforms to link against the right name, e.g. on macOS.
github-merge-queue bot pushed a commit to fortanix/rust-mbedtls that referenced this issue Jul 17, 2023
…ssl_flush_output functions

According to the discussion in
rust-lang/rust-bindgen#1221 (comment),
the \u{1} is a hint to LLVM to avoid name-mangling but this might be required
on some platforms to link against the right name, e.g. on macOS.
Taowyoo pushed a commit to fortanix/rust-mbedtls that referenced this issue Nov 2, 2023
…ssl_flush_output functions

According to the discussion in
rust-lang/rust-bindgen#1221 (comment),
the \u{1} is a hint to LLVM to avoid name-mangling but this might be required
on some platforms to link against the right name, e.g. on macOS.
Taowyoo added a commit to fortanix/rust-mbedtls that referenced this issue Nov 2, 2023
* Allow name-mangling for the mbedtls_free, mbedtls_calloc and mbedtls_ssl_flush_output functions

According to the discussion in
rust-lang/rust-bindgen#1221 (comment),
the \u{1} is a hint to LLVM to avoid name-mangling but this might be required
on some platforms to link against the right name, e.g. on macOS.

* Add macOS test in CI

* tests: add missing nextest config

* build: bump ver

---------

Co-authored-by: Tobias Naumann <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants