Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PM8576: DMAR fault when trying ntb link up #110

Open
rhardik opened this issue Dec 22, 2021 · 7 comments
Open

PM8576: DMAR fault when trying ntb link up #110

rhardik opened this issue Dec 22, 2021 · 7 comments

Comments

@rhardik
Copy link

rhardik commented Dec 22, 2021

Hi,
I am trying to test data transfer between two Intel CPU via PM8576 PCIe switch.

PCIe switch has 6 endpoint device connected.

CPU0 has already bounded with all above 6 endpoint ports with switch partition0 So CPU1 cannot bind with these devices so patition1 is empty.

But when I try ntb_tool to test data transfer between 2 CPU, it gives fault during linkup.

It worked with one incident when I unbind one PCIe switch endpoint device from CPU0 and bind to CPU1.
And then link up not giving any fault, I checked bind/unbind multiple times so the result is same.

So I can say that
If Zero endpoint devices bound to the CPU then it's giving fault when trying to do NTB link up.
Or
CPU should have bounded to atleast one endpoint device to make NTB work`

Getting error as below on the CPU1 (partition1) which has not bounded to any switch endpoint.

DMAR: DRHD: handling fault status reg 102 000: DMAR: [DMA Read] Request device [ed:01.1] PASID ffffffff fault addr fffd0000 [fault reason 02] Present bit in context entry is clear

Thanks,
Hardik

@lsgunth
Copy link
Collaborator

lsgunth commented Dec 22, 2021

The problem does not likely have anything to do with the end points. There are quirks required in the kernel to ensure NTB works correctly with the iommu to prevent errors like that.

What kernel are you running?

@rhardik
Copy link
Author

rhardik commented Dec 23, 2021

I'm using 5.4.115 kernel

Switcher-kernel module :
`commit dcda8e5
Author: Kelvin Cao [email protected]
Date: Mon Mar 22 12:36:16 2021 +0000

Update version to 1.7`

Linux kernel:
root@alm-64-abl-cpu:~# uname -a
Linux alm-64-abl 5.4.115-rt57-alm-64-abl #1 SMP PREEMPT_RT Fri May 14 02:55:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

@lsgunth
Copy link
Collaborator

lsgunth commented Dec 23, 2021

Hmm, not sure. I think the quirks for the IOMMU should be in that kernel. Does it work with a newer kernel version? Does it work if you disable the iommu?

@rhardik
Copy link
Author

rhardik commented Dec 24, 2021

Yes It works after disabling IOMMU.
I have not tried newer Linux kernel yet but I can see the quirks in present kernel.

Attaching quirk.c
quirks.zip

@lsgunth
Copy link
Collaborator

lsgunth commented Jan 4, 2022

Anything in dmesg about the quirk? Maybe it's failing to create the iommu aliases?

@rhardik
Copy link
Author

rhardik commented Jan 5, 2022

Hi,

dmesg logs
0.715981] pci 0000:ed:00.1: Setting Switchtec proxy ID aliases

Attached dmesg logs for Switchtec ($ dmesg | grep Switchtec)
dmesg-switchtec.txt

It's shows all partitions are invalid.

@lsgunth
Copy link
Collaborator

lsgunth commented Jan 5, 2022

Hmm, sounds like the requester ID table sizes are not set in a way that the quirk can pick it up. I'm not sure if they are -1 or a value greater than the quirk supports. TCheck your config and try to ensure the tables are no greater than 512 in size and are enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants