Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atreus - "GET_DESCRIPTOR" calls are timing out with latest firmware (0.92.1+94) #40

Closed
NessDan opened this issue Jan 31, 2024 · 8 comments · Fixed by keyboardio/Kaleidoscope-Bundle-Keyboardio#53

Comments

@NessDan
Copy link

NessDan commented Jan 31, 2024

Hey @obra , this is Daniel from keyboard.gg!

Sorry to bug, but I got a report from a user in my Discord that upgrading his firmware of his Atreus leads to it no longer working with my Edgeguard adapter. I decided to do a deep dive to see what exactly the issue is and if I can update my code to support your changes.

image

After diving in, I can confirm that at least the latest firmware version (0.92.1+94) has changed it's behavior and my calls to GET_DESCRIPTOR are no longer returning as expected (appearing to be timeouts possibly?) Downgrading the firmware to 0.92.0+77 fixes this issue, as videod below.

(Make sure to unmute!)
Part 1 🎥
Part 2 🎥
Part 3 🎥
Part 4 🎥

I noticed a changelog update,

USB protocol Fixes for both AVR-based and GD32-based keyboards that
may eliminate "communications timeout" issues when talking to
Chrysalis from Taylor Yu [email protected]

This in my mind seems low-level enough that it could be the source of what I'm seeing. I'd be happy to share how my adapter is connecting with keyboards in case there's any wonder if it's the source of this issue, but this is also the only time I've seen this behavior pop up on any of the keyboards I've tested.

Going to ping @tlyu in this thread as well in hopes to hear what could be the problem. I think this could be related to this PR.


Update as well: When connecting to Windows, I've confirmed there is a timeout error during enumeration before the keyboard even begins to start working. Unsure if there's some deeper root cause here, but the older firmware didn't have this issue.

Older firmware (no timeouts, just responses.) Newer firmware (timeouts, with a response)
image 2024-01-31 02_34_08-atreus-edgeguard-0 92 1+94 - Total Phase Data Center v7 01 000

Attaching more videos & Beagle USB captures (each time, the "a" key was pressed repeatedly before disconnecting and stopping.)

🎥 Timeout when plugged directly into Windows
🎥 Timout when plugged into Edgeguard

atreus-0.92.0+77-vs-0.92.1+94-tdc-captures.zip

@tlyu
Copy link
Contributor

tlyu commented Jan 31, 2024

Thank you so much for the Beagle captures!

I'm pretty sure you've come across what's probably an ancient bug in the Arduino AVR USB core. The problem firmware release included an unrelated minor protocol conformance bugfix to the boot keyboard descriptor (correctly specifying a logical maximum of 255 instead of -1). That fix coincidentally increased the descriptor's length from 63 to 64 bytes.

The bug (if I'm right) is a failure by the AVR core to send a ZLP after sending a multiple of 64 bytes when a longer length has been requested for a device-to-host control transfer. The low-level fixes to the AVR core in the problem firmware were for correctly handling receiving ZLPs from the host, which are probably unrelated to the bug you're seeing.

The PR you linked is not yet in any published firmware snapshots, to the best of my knowledge. It will also coincidentally work around this problem by having optimized keyboard report descriptors that are shorter than 64 bytes.

As a workaround, you can always specify the correct length when requesting the report descriptor, which I believe is what Linux and macOS do. (That would be the report descriptor length given in the HID class-specific descriptor.)

@tlyu
Copy link
Contributor

tlyu commented Jan 31, 2024

After reviewing the AVR core code, I can summarize: the descriptor size change unmasked an ancient bug in the Arduino AVR USB core where it doesn't properly send ZLPs in control transfers.

@NessDan
Copy link
Author

NessDan commented Jan 31, 2024

Thank you so much for the responses and workarounds for the time being!

Would this be fixable in a future firmware update to the Atreus?

Also apologies for calling out the wrong PR, thanks for looking into this!

@tlyu
Copy link
Contributor

tlyu commented Jan 31, 2024

This could definitely be fixed in a future Atreus firmware update. I don't know yet what form it would take, or when it would happen. @obra could decide on one of the workaround options, and/or we fix the underlying bug in the AVR core in our fork.

@obra
Copy link
Member

obra commented Jan 31, 2024

@NessDan Can you test out this Atreus firmware build to see if it resolves the issue for you? It's a build of the 0.92.1 firmware with only @tlyu's USB ZLP fix applied.
Atreus-v1.99.8-661-ge6a0d4.hex.zip

@tlyu
Copy link
Contributor

tlyu commented Feb 1, 2024

With a Beagle, I confirm the observations of Windows timeouts, multiple resets, and eventual successful operation on Model 01 with 0.92.1+94. I also observe that Model 01 with 0.92.1+94 with my AVR core patch no longer has the long timeouts or multiple resets on Windows.

@NessDan
Copy link
Author

NessDan commented Feb 1, 2024

New firmware is working for me & my customer :) Thank you both! (FYI I also captured the keyboard connecting to Windows & Edgeguard with the latest firmware! May be worth double checking if everything looks right though. I noticed one error, unsure if it's important or if it's on my end but everything works still.)

image

atreus-edgeguard-and-windows-0.92.2+100-tdc-captures.zip

If all of that looks normal, I think this issue is resolved and can be closed 😄

@tlyu
Copy link
Contributor

tlyu commented Feb 1, 2024

Thanks for the feedback!

The OUT transaction seems to be sent by the host (the Edgeguard). It's a ZLP with a DATA0 PID, so most likely intended to be a handshake for an IN from a device-to-host control transfer. However, the most recent control transfer was host-to-device, with a correct ZLP IN handshake. What's even stranger is that the OUT comes 235ms after the last control transfer. It seems that the device correctly ignores the transfer without sending ACK, which I believe is the correct behavior, because it's not expecting such a transfer in its state machine.

That's about all I can conclude without a lot more details about your implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants