Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: ptp_clock_info in ena 2.13 is missing .adjfine #322

Open
3 tasks done
gemorin opened this issue Sep 25, 2024 · 7 comments
Open
3 tasks done

[Bug]: ptp_clock_info in ena 2.13 is missing .adjfine #322

gemorin opened this issue Sep 25, 2024 · 7 comments
Labels
bug Report errors or unexpected behavior Linux ENA driver triage Determine the priority and severity

Comments

@gemorin
Copy link

gemorin commented Sep 25, 2024

Preliminary Actions

Driver Type

Linux kernel driver for Elastic Network Adapter (ENA)

Driver Tag/Commit

ena 2.13.0

Custom Code

No

OS Platform and Distribution

Linux 6.2+ with ena.phc_enable=1

Bug description

Starting from Linux 6.2 (commit 75ab70ec5cef), .adjfine (introduced in 4.10) is required in struct ptp_clock_info and will trigger a NULL pointer dereference in (at least) ptp_clock_adjtime()

testing this change:

--- a/kernel/linux/ena/ena_phc.c
+++ b/kernel/linux/ena/ena_phc.c
@@ -15,6 +15,15 @@ static int ena_phc_adjfreq(struct ptp_cl
 }
 
 #endif /* ENA_PHC_SUPPORT_ADJFREQ */
+
+#ifdef ENA_PHC_SUPPORT_ADJFINE
+static int ena_phc_adjfine(struct ptp_clock_info *clock_info, long delta)
+{
+       return -EOPNOTSUPP;
+}
+
+#endif /* ENA_PHC_SUPPORT_ADJFINE */
+
 static int ena_phc_adjtime(struct ptp_clock_info *clock_info, s64 delta)
 {
        return -EOPNOTSUPP;
@@ -113,6 +122,9 @@ static struct ptp_clock_info ena_ptp_clo
 #ifdef ENA_PHC_SUPPORT_ADJFREQ
        .adjfreq        = ena_phc_adjfreq,
 #endif /* ENA_PHC_SUPPORT_ADJFREQ */
+#ifdef ENA_PHC_SUPPORT_ADJFINE
+       .adjfine        = ena_phc_adjfine,
+#endif /* ENA_PHC_SUPPORT_ADJFINE */
        .adjtime        = ena_phc_adjtime,
 #ifdef ENA_PHC_SUPPORT_GETTIME64
 #ifdef ENA_PHC_SUPPORT_GETTIME64_EXTENDED
--- a/kernel/linux/ena/kcompat.h
+++ b/kernel/linux/ena/kcompat.h
@@ -995,6 +995,10 @@ static inline bool ktime_after(const kti
 #define ptp_clock_register(info, parent) ptp_clock_register(info)
 #endif
 
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 10, 0)
+#define ENA_PHC_SUPPORT_ADJFINE
+#endif /* ENA_PHC_SUPPORT_ADJFINE */
+
 #endif /* CONFIG_PTP_1588_CLOCK */
 
 #if (LINUX_VERSION_CODE < KERNEL_VERSION(3, 19, 0)) && \

Reproduction steps

1. boot a Linux 6.2+ kernel

Expected Behavior

not crash

Actual Behavior

crash :-)

Additional Data

No response

Relevant log output

abbreviated kernel log:

BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
PGD 149927067 P4D 149927067 PUD 143ccf067 PMD 0 
Oops: 0010 [#1] SMP NOPTI
CPU: 0 PID: 9217 Comm: t Tainted: G           O        #1
Hardware name: Amazon EC2/Not Specified, BIOS 1.0 10/16/2017
RIP: 0010:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
Call Trace:
 <TASK>
 ptp_clock_adjtime+0xd1/0x1d0
 pc_clock_adjtime+0x5b/0xa0
 __do_sys_clock_adjtime+0x8a/0x110
 __x64_sys_clock_adjtime+0x16/0x20
 x64_sys_call+0x1321/0x2020
 do_syscall_64+0x33/0x80
 entry_SYSCALL_64_after_hwframe+0x78/0xe2

Contact Details

[email protected]

@gemorin gemorin added bug Report errors or unexpected behavior triage Determine the priority and severity labels Sep 25, 2024
@davidarinzon
Copy link
Contributor

davidarinzon commented Sep 26, 2024

Thank you for raising this issue, @gemorin

We will resolve it in the next driver release.

davidarinzon added a commit that referenced this issue Nov 25, 2024
Commit [1] introduced the adjfine() hook for PTP clocks.
Patchset [2] represents a transition from the adjfreq()
hook to the adjfine() hook, which included a conversion
of all the drivers adjfreq() to adjfine().
As a result, in pathset [2], there's an implicit assumption
that all drivers implement adjfine(), without a check
of whether the hook has been added.
However, our driver doesn't implement adjfine(), and PHC wasn't
part of the kernel during the above mentioned effort, leaving
this support missing.
This results in an assertion error, which was reported in [3].

This commit adds the adjfine() hook.

[1]: torvalds/linux@d8d2635
[2]: https://lore.kernel.org/netdev/[email protected]/
[3]: #322

Signed-off-by: David Arinzon <[email protected]>
@davidarinzon
Copy link
Contributor

Hi @gemorin

The issue has been resolved in https://github.com/amzn/amzn-drivers/releases/tag/ena_linux_2.13.1
Please let us know if it works for you

@Ichimikichiki
Copy link

Hi @gemorin

The issue has been resolved in https://github.com/amzn/amzn-drivers/releases/tag/ena_linux_2.13.1 Please let us know if it works for you

/usr/src/amzn-drivers-2.13.1/kernel/linux/ena/ena_phc.c:121:10: error: ‘struct ptp_clock_info’ has no member named ‘adjfreq’
  121 |         .adjfreq        = ena_phc_adjfreq,
      |          ^~~~~~~
compilation terminated due to -Wfatal-errors.

@davidarinzon
Copy link
Contributor

Hi @gemorin
The issue has been resolved in https://github.com/amzn/amzn-drivers/releases/tag/ena_linux_2.13.1 Please let us know if it works for you

/usr/src/amzn-drivers-2.13.1/kernel/linux/ena/ena_phc.c:121:10: error: ‘struct ptp_clock_info’ has no member named ‘adjfreq’
  121 |         .adjfreq        = ena_phc_adjfreq,
      |          ^~~~~~~
compilation terminated due to -Wfatal-errors.

Hi @Ichimikichiki

Thanks for raising this, can you please share more information? Which distribution are you using?
I would assume that it didn't compile for you before the patch/new version as well.

@Ichimikichiki
Copy link

Ichimikichiki commented Nov 28, 2024

Hi @Ichimikichiki

Thanks for raising this, can you please share more information? Which distribution are you using? I would assume that it didn't compile for you before the patch/new version as well.

It compiled fine on 2.13.0 - err it appears the error exists in 2.13.0 also. It's currently running 2.12.0 which compiled fine.

I'm on Rocky Linux 5.14.0-503.15.1.el9_5.x86_64

make -C /lib/modules/5.14.0-503.15.1.el9_5.x86_64/build M=/usr/src/amzn-drivers-2.13.1/kernel/linux/ena modules
make[1]: Entering directory '/usr/src/kernels/5.14.0-503.15.1.el9_5.x86_64'
  CC [M]  /usr/src/amzn-drivers-2.13.1/kernel/linux/ena/ena_netdev.o
  CC [M]  /usr/src/amzn-drivers-2.13.1/kernel/linux/ena/ena_ethtool.o
  CC [M]  /usr/src/amzn-drivers-2.13.1/kernel/linux/ena/ena_lpc.o
  CC [M]  /usr/src/amzn-drivers-2.13.1/kernel/linux/ena/ena_phc.o
/usr/src/amzn-drivers-2.13.1/kernel/linux/ena/ena_phc.c:121:10: error: ‘struct ptp_clock_info’ has no member named ‘adjfreq’
  121 |         .adjfreq        = ena_phc_adjfreq,
      |          ^~~~~~~
compilation terminated due to -Wfatal-errors.
make[2]: *** [scripts/Makefile.build:249: /usr/src/amzn-drivers-2.13.1/kernel/linux/ena/ena_phc.o] Error 1
make[1]: *** [Makefile:1944: /usr/src/amzn-drivers-2.13.1/kernel/linux/ena] Error 2
make[1]: Leaving directory '/usr/src/kernels/5.14.0-503.15.1.el9_5.x86_64'
make: *** [Makefile:98: ena.ko] Error 2
grep -ri adjfreq
ena_phc.c:#ifdef ENA_PHC_SUPPORT_ADJFREQ
ena_phc.c:static int ena_phc_adjfreq(struct ptp_clock_info *clock_info, s32 ppb)
ena_phc.c:#endif /* ENA_PHC_SUPPORT_ADJFREQ */
ena_phc.c:#ifdef ENA_PHC_SUPPORT_ADJFREQ
ena_phc.c:	.adjfreq	= ena_phc_adjfreq,
ena_phc.c:#endif /* ENA_PHC_SUPPORT_ADJFREQ */
kcompat.h:#define ENA_PHC_SUPPORT_ADJFREQ
kcompat.h:#endif /* ENA_PHC_SUPPORT_ADJFREQ */

@davidarinzon
Copy link
Contributor

Thank you for raising this @Ichimikichiki
Indeed, seems that there's a compilation issue with RHEL and RL after the backports of the function removal which is slightly different from the originally reported issue here.

Please accept this temporary patch and let us know if it resolves the issue for you.

0001-linux-ena-Patch-Resolve-compilation-issue-with-adjfr.patch

@Ichimikichiki
Copy link

Thank you for raising this @Ichimikichiki Indeed, seems that there's a compilation issue with RHEL and RL after the backports of the function removal which is slightly different from the originally reported issue here.

Please accept this temporary patch and let us know if it resolves the issue for you.

0001-linux-ena-Patch-Resolve-compilation-issue-with-adjfr.patch

Ah legend! Thanks yeah that's resolved it. Thanks for your fast response!
Sorry I would have made a new issue, but assumed the fix for the OP's bug, was the cause of this new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Report errors or unexpected behavior Linux ENA driver triage Determine the priority and severity
Projects
None yet
Development

No branches or pull requests

3 participants