Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No such device error #114

Open
semperrin opened this issue Feb 21, 2022 · 25 comments
Open

No such device error #114

semperrin opened this issue Feb 21, 2022 · 25 comments

Comments

@semperrin
Copy link

semperrin commented Feb 21, 2022

I am trying to transport data using IPoPCI via ntb_netdev. I am following the general outline given in step 4 of the "Non-Crosslink NTB connection for Linux" section:
https://docs.nvidia.com/drive/drive_os_5.1.6.1L/nvvib_docs/index.html#page/DRIVE_OS_Linux_SDK_Development_Guide/System%20Programming/sys_components_non_transparent_bridging.html
One machine is running Ubuntu 18.04 and has a PM40036 and the other machine is running Ubuntu 16.04 and has a PM8534 switch.

I am trying to load the kernels doing the following

modprobe ntb
modprobe switchtec
modprobe ntb_hw_switchtec
modprobe ntb_transport
modprobe ntb_netdev

When I load ntb_transport I get the first line:

[  559.033375] Software Queue-Pair Transport over NTB, version 4

However, I do not get the second line:

[  559.034097] switchtec switchtec0: ntb link up

Then if I try to load ntb_netdev I get the following:

modprobe: ERROR: could not insert 'ntb_netdev': No such device

Can you provide me with any information on what may be causing this error?

@lsgunth
Copy link
Collaborator

lsgunth commented Feb 22, 2022

Hmmm, it sounds like there may not be an NTB device. Could be a configuration issue with one of the switches or an number of other issues. Do you see anything in dmesg related to switchtec?

@semperrin
Copy link
Author

After the command:

sudo modprobe switchtec

I get the following in dmesg:

[Feb23 14:51] switchtec: loaded.

And after the command:

sudo modprobe ntb_transport

I get the following in dmesg:

[ +31.978234] Software Queue-Pair Transport over NTB, version 4

Those are the only messages I get in dmesg

@semperrin
Copy link
Author

My issue seems quite similar to #106 as when I also run switchtec-user command switchtec -list it returns:

free(): invalid pointer
Aborted (core dumped)

@lsgunth
Copy link
Collaborator

lsgunth commented Feb 23, 2022

Yup. Your switch is not configured to have a management or NTB endpoint so there is no device for the drivers to attached to.

Also seems there's a bug in switchtec list.... it shouldn't be core dumping.

@semperrin
Copy link
Author

The core dumping appears to be addressed in this pull request: Microsemi/switchtec-user#263

@semperrin
Copy link
Author

The PM40036 chip is within a Dolphin MXH930 NTB host adapter. Is the switch configuration you are referring to a HW or SW change? If SW, do you know how this configuration is managed and if it can be changed?

@lsgunth
Copy link
Collaborator

lsgunth commented Feb 24, 2022

Yes, it's part of the firmware download. It's usually done with the Chiplink software. You might need to contact Microchip or your vendor (Dolphin) to get that setup. It's odd to me that a card designed for NTB doesn't have it configured correctly to begin with.

@semperrin
Copy link
Author

Does the following output confirm that the switch is configured to be a NTB endpoint? My initial assumption was that it did.

$ lspci | grep Sierra
07:00.0 PCI bridge: PMC-Sierra Inc. Device 4036
07:00.1 Bridge: PMC-Sierra Inc. Device 4036
07:00.2 System peripheral: PMC-Sierra Inc. Device 4036
08:00.0 PCI bridge: PMC-Sierra Inc. Device 4036

If the above does not confirm that the switch is configured to be a NTB endpoint would:

ls /dev/switchtec*

listing a device confirm this? (currently that command lists no such devices for me) Is there some other way to confirm that the switch is or is not configured to be a NTB endpoint?

@Kendidi
Copy link

Kendidi commented Mar 20, 2022

I tried to load switchtec drivers (ntb.ko, switchtec.ko and ntb_hw_switchtec.ko) for Dolphin MXP930. But it appeared failing to find the crosslink partition while enumerating the BARs.

[ 673.144040] switchtec: loading out-of-tree module taints kernel.
[ 673.144084] switchtec: module verification failed: signature and/or required key missing - tainting kernel
[ 673.144836] switchtec 0000:01:00.1: enabling device (0000 -> 0002)
[ 673.149496] switchtec switchtec0: Management device registered.
[ 673.150487] switchtec: loaded.
[ 673.391363] switchtec switchtec0: failed to register ntb device: -12
[ 1225.316886] switchtec switchtec0: unregistered.

Is there anything needed to be configured on the adapter (e.g. update firmware) or modify the switchtec drivers before I can load the drivers successfully? Please advise!

@lsgunth
Copy link
Collaborator

lsgunth commented Mar 22, 2022

@semperrin The lspci trace doesn't tell us much. If you aren't getting a /dev/switchtec device then it's not configured for NTB and it is not configured with a management endpoint and there's not much you can do about that besides reconfigure it.

@kendid You got a -12 error which is ENOMEM. This is not a likely error. It can happen if your system has no memory (unlikely) but it can also happen if the kernel is unable to map parts of the PCI bar. My guess is the switch's BARs are not configured appropriately for the driver and it's trying to map a bar that doesn't exist.

@Kendidi
Copy link

Kendidi commented Mar 22, 2022

I've added some debug print to crosslink_enum_partition().

lspci shows:
Region 0: Memory at b5000000 (32-bit, non-prefetchable) [disabled] [size=4M]
Region 2: Memory at a0000000 (64-bit, prefetchable) [disabled] [size=256M]
Region 4: Memory at b0000000 (32-bit, non-prefetchable) [disabled] [size=64M]
Region 5: Memory at b4800000 (32-bit, non-prefetchable) [disabled] [size=8M]

Now dmesg returned:
[ 66.519772] switchtec switchtec0: Crosslink BAR0 addr: 0
[ 66.519800] switchtec switchtec0: Crosslink BAR2 addr: 0
[ 66.519829] switchtec switchtec0: Crosslink BAR4 addr: 0
[ 66.519832] switchtec switchtec0: Error enumerating crosslink partition
[ 66.519840] switchtec switchtec0: failed to register NTB device: -22

Don't understand why BARs couldn't be read properly.

`/dev/switchtec0' appears after the driver is loaded. Is there any way the BARs reading can be corrected/adjusted?

@lsgunth
Copy link
Collaborator

lsgunth commented Mar 22, 2022

Now you're getting error 22? (EINVAL)? Did you change something in the error path? Maybe confirm where in the code the error is actually happening.

The fact that lspci indicates the bars are disabled usually just means the driver isn't loaded yet.

@Kendidi
Copy link

Kendidi commented Mar 22, 2022

Oh, I only added debug messages.

-22 is -EINVAL. It's returned right after Error enumerating crosslink partition was printed, in switchtec_ntb_init_crosslink().

You are right. Previous print was done before drivers were loaded.

After drivers are loaded:
Region 0: Memory at b5000000 (32-bit, non-prefetchable) [size=4M]
Region 2: Memory at a0000000 (64-bit, prefetchable) [size=256M]
Region 4: Memory at b0000000 (32-bit, non-prefetchable) [size=64M]
Region 5: Memory at b4800000 (32-bit, non-prefetchable) [size=8M]

@lsgunth
Copy link
Collaborator

lsgunth commented Mar 23, 2022

Hmmm, the enumerating cross link partitions error is likely caused by the middle partition not being configured correctly with the right type of bars.

Cross link is very tricky and needs a specific switch configuration. Microchip used to have an app note for that. You should probably get in touch with your vendor.

@Kendidi
Copy link

Kendidi commented Mar 23, 2022

My card is Dolphin MXH930. I wonder typically does anything need to be done to it before switchtec drivers can load, detect and register the device to the ntb core successfully? Thanks.

@lsgunth
Copy link
Collaborator

lsgunth commented Mar 23, 2022

No idea. I have no clue what that card is or how it's setup. You should probably contact Dolphin for support.

@Kendidi
Copy link

Kendidi commented Mar 25, 2022

I see.

I wonder where can I learn more on how Microsemi/switchtec Crosslink works and its requirements , etc?

Regarding the following in switchtec_ntb_init_crosslink():

	if (bar_cnt < sndev->nr_direct_mw + 1) {
		dev_err(&sndev->stdev->dev,
			"Error enumerating crosslink partition\n");
		return -EINVAL;
	}

My current environment reports:

  bar_cnt = 1
  nr_direct_mw =3

What approach would you recommend to debug thist? Why bar_cnt should be >= nr_direct_mw + 1 in order for it to proceed? Thanks!

@Kendidi
Copy link

Kendidi commented Mar 25, 2022

It appears crosslink config is looking for the following BAR addresses from vEP, am I correct?

1.  0x00_0000_0000
2.  0x10_0000_0000
3.  0x20_0000_0000

@lsgunth
Copy link
Collaborator

lsgunth commented Mar 29, 2022

As I recall, Microsemi had an app note on crosslink, but as far as I know the only way to get it is through their support. If you don't have support to understand how the switch needs to be configured and to reconfigure it, I'm not sure you are going to be able to make it work at all.

Yes, cross link is looking at the configuration of the BARs in the virtual partition. It doesn't have enough bars to map the bars in the real partitions, so it just bails.

@Kendidi
Copy link

Kendidi commented Apr 1, 2022

I see. Thank you Logan for the confirmation and info,!

So if server A wants to share a NVMe SSD to server B, can both NTB adapters on server A and server B have the same configuration/firmware? Or should they be different since the server's role are different? Thanks.

@lsgunth
Copy link
Collaborator

lsgunth commented Apr 1, 2022

I'm not aware of any solutions for NVMe sharing that are not complicated and proprietary. So you're pretty much on your own if you want to implement something like that.

The point of cross link was to make both machines symmetric. (So cases where each machine has a switch which connect to each other). In these cases the configuration for each switch should be the identical.

Just as it's possible to write a driver to share the NVMe drive between two partitions on a single switch, it should, at least theoretically, be possible to share an NVMe drive over a cross link setup. Though, I don't know if anyone has actually ever tried this.

@Kendidi
Copy link

Kendidi commented Apr 5, 2022

Okie. Thanks Logan!

I tried to look up for more info. on Microsemi ChipLink but I couldn't find anything about it. It is rebranded to another product or something?

@jborz27
Copy link
Collaborator

jborz27 commented Apr 5, 2022 via email

@Kendidi
Copy link

Kendidi commented Apr 6, 2022

Thanks @jborz27 ! I will see where it can be downloaded.

@Kendidi
Copy link

Kendidi commented Apr 7, 2022

Any suggestions on where to and how to get Microsemi/Microchip Chiplink?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants