-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zigbee ota updates #192
base: main
Are you sure you want to change the base?
Zigbee ota updates #192
Conversation
All relevant sections are behind a build toggle which can be activated in Kconfig.
Parameters are based on the zigbee light switch example from nrf-connect toolkit. clang-format with vs code.
926566c
to
61533da
Compare
Thanks @oleo65, this looks great. A couple of minor things:
|
ZB_AF_SET_IDENTIFY_NOTIFICATION_HANDLER(PRST_ZIGBEE_ENDPOINT, identify_cb); | ||
|
||
#ifdef CONFIG_ZIGBEE_FOTA | ||
ZB_AF_SET_IDENTIFY_NOTIFICATION_HANDLER(CONFIG_ZIGBEE_FOTA_ENDPOINT, identify_cb); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, what's the expected behavior here? We have two identify handlers in different endpoints. Where is the FOTA one used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CONFIG_ZIGBEE_FOTA_ENDPOINT
comes from the library which is activated by the CONFIG_ZIGBEE_FOTA
toggle. Basically an additional zigbee application (?) is registered on the device if OTA functionality is active. This is what I interpreted from the docs and the code.
The lines from above I simply took from the nordic sample. If they are really needed I cannot tell, but I wanted to rule out all possible errors during development by missing something out.
I would suggest, we leave it like it is now, and once we can confirm that OTA is working we could try if it is really needed. 👼
I am perfectly happy with only one I will come up with some changes to the docs, once we confirmed, that ota is working. Well at least I think we can provide the info that OTA is possible via zigbee and can be activated with the build toggle. But I would not give individual advice on how to provide the OTA files to the zigbee network, because in my opinion this is out of scope of this project and highly individual based on the setup. Regarding E2E testing I am still trying to figure out how to get the zigbee coordinator to be aware of the update file. My setup is Home Assistant + ZHA + Conbee II USB Stick. In theory this should be possible and I successfully updated other zigbee devices in the past. OTA is undergoing a heavy refactoring within ZHA / zipgy at the moment and it might be broken with HA 2024.4.X so I am waiting on the next update to see if it resolves by itself. On my end I think I provided everything as I saw in various places in the HA forums. 🤷 |
…o `n` for now. Removed obsolete prj_ota.conf file.
Awesome, thanks @oleo65. Code looks good to me and it's a great addition. I am away from my computer for the next couple of days, but I will help run some e2e tests when I'm back. I have a similar setup as yours, with ZHA + SkyConnect. I also need to figure out how to make OTA work there. |
@oleo65 I started unraveling the mysteries of Zigbee OTA. Pushed a minor commit to this PR defining the Made some notes, but still more questions than answers 😬: zigpy OTA
Examples
Questions
|
@oleo65 did you manage to flash and run the firmware from this PR? I tried a few times but I was getting a hard crash a few seconds after boot. I compared the changes in the Kconfig with the defines in the So if you excuse me rolling back on my own suggestion, I took the liberty of creating a separate |
Yes, I have exactly the same issue and was not able to get the parasites running with the latest revision. With my initial commit I flashed two parasite successfully and after some resetting they work flawless so I can far (except from the OTA functionality). I did some research a couple of days ago but was not able to get a real lead on this. The stack trace seems to point to a wrong pointer somewhere in the zigbee library. But I suspect that this is not the real problem but rather a symptom. What I discovered was that we might need to increase the stack size further. But I was not able to figure this out yet and I think it is only a weak lead. Have you been able to successfully flash and run the firmware by splitting the config up again? I believe it should be possible to run it from a merged config file however. 🤔 |
Yeah! It works with the split config. A inheritance config model would be cool, but I'm also fine with split + some duplication while we test this. If the goal is to enable OTA by default, we can then get rid of the old, non-OTA one anyway. Now I'm trying to see if I can get Home Assistant + zigpy to figure out about our |
Aha. New way to specify OTA updates via ZHA, introduced in home-assistant/core#111159, onboarding changes from zigpy zigpy/zigpy#1340. Directly pointing to .zigbee files considered harmful. Let's do it! |
Great find @rbaron . This looks promising. 😊 |
I made some progress but no OTA yet on Home Assistant 2024.5.3 + Sky Connect. What I did: Compiled and flashed a b-parasite with I went for the recommended way to tell ZHA about OTA firmwares, using the zha:
zigpy_config:
ota:
z2m_local_index: /config/zigpy_ota/index.json To generate the $ node scripts/add.js ~/dev/parasite/b-parasite/code/nrf-connect/samples/zigbee/build/zephyr/DB15-0141-01020000-b-parasite.zigbee This gives me the [
{
"fileVersion": 16908288,
"fileSize": 375832,
"manufacturerCode": 56085,
"imageType": 321,
"sha512": "bbce367fca8c9bc853e0d67c4e05a5479f1443997e6a946677f465df2c1963370c6d2c9f9a2444ee95e011af01253629caf27643c1afe4fdc27735a106f705a9",
"url": "https://github.com/Koenkk/zigbee-OTA/raw/master/images/DIY/DB15-0141-01020000-b-parasite.zigbee",
"path": "/config/zigpy_ota/DB15-0141-01020000-b-parasite.zigbee"
}
] The I then put the compiled OTA file from this PR in I now can trigger an And I see some promising messages in the log: $ tail -f /config/home-assistant.log | egrep 'zigpy.ota|zigpy.util|zigpy.*OTA'
2024-05-12 08:43:13.165 DEBUG (MainThread) [zigpy.ota] Loaded 1 images from provider: <zigpy.ota.providers.LocalZ2MProvider object at 0x7ff56c52d280>
2024-05-12 08:43:13.168 DEBUG (MainThread) [zigpy.ota] Caching image OtaImageWithMetadata(metadata=LocalOtaImageMetadata(file_version=16908288, manufacturer_id=56085, image_type=321, checksum='sha512:bbce367fca8c9bc853e0d67c4e05a5479f1443997e6a946677f465df2c1963370c6d2c9f9a2444ee95e011af01253629caf27643c1afe4fdc27735a106f705a9', file_size=375832, manufacturer_names=(), model_names=(), changelog=None, release_notes=None, min_hardware_version=None, max_hardware_version=None, min_current_file_version=None, max_current_file_version=None, specificity=None, source='Local Z2M provider (/config/zigpy_ota/index.json)', path=PosixPath('/config/zigpy_ota/DB15-0141-01020000-b-parasite.zigbee')), firmware=OTAImage(header=OTAImageHeader(upgrade_file_id=200208670, header_version=256, header_length=56, field_control=<FieldControl: 0>, manufacturer_id=56085, image_type=321, file_version=16908288, stack_version=2, header_string=<'\nb-parasite'>, image_size=375832, *device_specific_file=False, *hardware_versions_present=False, *security_credential_version_present=False), subelements=[<SubElement(tag_id=<ElementTagId.UPGRADE_IMAGE: 0>, data=[375770:1500a163696d6781a2626964006473697a651a0005bbc33db8...649268ed52716e])>]))
2024-05-12 08:43:13.169 DEBUG (MainThread) [zigpy.ota] Picking firmware OtaImageWithMetadata(metadata=LocalOtaImageMetadata(file_version=16908288, manufacturer_id=56085, image_type=321, checksum='sha512:bbce367fca8c9bc853e0d67c4e05a5479f1443997e6a946677f465df2c1963370c6d2c9f9a2444ee95e011af01253629caf27643c1afe4fdc27735a106f705a9', file_size=375832, manufacturer_names=(), model_names=(), changelog=None, release_notes=None, min_hardware_version=None, max_hardware_version=None, min_current_file_version=None, max_current_file_version=None, specificity=None, source='Local Z2M provider (/config/zigpy_ota/index.json)', path=PosixPath('/config/zigpy_ota/DB15-0141-01020000-b-parasite.zigbee')), firmware=OTAImage(header=OTAImageHeader(upgrade_file_id=200208670, header_version=256, header_length=56, field_control=<FieldControl: 0>, manufacturer_id=56085, image_type=321, file_version=16908288, stack_version=2, header_string=<'\nb-parasite'>, image_size=375832, *device_specific_file=False, *hardware_versions_present=False, *security_credential_version_present=False), subelements=[<SubElement(tag_id=<ElementTagId.UPGRADE_IMAGE: 0>, data=[375770:1500a163696d6781a2626964006473697a651a0005bbc33db8...649268ed52716e])>])) The But then nothing happens, neither on HA nor b-parasite AFAICT. The full HA log also doesn't tell me more. it's possible I'm missing some details in the OTA file header. The mystery remains. |
Update: goddamn, it kinda works!! After the steps from the previous comment, I thought the OTA process would automatically kick in. But from the zigpy code I realize that the outcome of the process is just a "firmware update available" event trigger. It makes this available in the device page in Home Assistant: Tapping the Home Assistant logs:
b-parasite logs:
The b-parasite LED is also flashing, due to the The only thing is that this is very slow. At this rate the full transfer should take... 2 hours! I remember years ago I tried doing an OTA upgrade for an IKEA shortcut button with zigbee2mqtt and was also surprised about this, so maybe it's working as intended. I will try to measure the battery consumption during OTA, but I feel like it's not pretty. The flashing LED may be a little too much too, but let's see. Worst case we just do a few flashes and stop it, instead of continuously flashing it for 2h. I will report back once the transfer finishes. There's still room for things to go wrong after the transfer. |
Two hours later, indeed it works. b-parasite logs as it finishes applying the upgrade and reboots (note the
HA also shows the device as up-to-date now, with firmware version I still want to see if changing the polling interval speeds up the OTA process. |
Something to keep in mind: in the latest nRF Connect SDK (2.6.99 as of writing) there's a known issue I tried enabling the "turbo poll" extension (which I think is enabled by default anyway), but there was no difference in upload time. I see roughly two messages per second, which makes me think we're hitting the |
I believe I read somewhere else that long update times are very common with zigbee end devices. It is not something which looks too concerning to me. Very glad to see that you were able to get it working. I might be able to check this with my setup tonight, so we have at least a second E2E Test. 😊 |
@rbaron Have you been able to successfully flash a parasite with a firmware build from the latest commit using the The firmware builds fine but the parasite errors out before zephyr os is loaded. Flashing an older already built firmware I had lying around worked fine on the same parasite. I had a similar error a while back but cannot remember how I fixed it in the end. (I am building for a parasite v1.2.0, but this seems like a minor detail to me.)
|
@oleo65 that's odd. I just tried it again and it works for me at the current commit: On a nRF52840 2.0.0 board:
On a nRF52840 1.2.0 board:
I see the log format are a bit different (they look like the I would try in this order, and see if it works:
I'm curious. Let me know how it goes. |
I tried a ton of things now and are equally more confused and learned some things. 👼 In the end I did a clean checkout of the For building the I did not have any more time for debugging, since I am away for the week. For some more context, I build against the latest v2.5.3 release of the nordic SDK, and flashed with my CMSIS-DAP probe via python. This might not be the same as with the J-LINK, but it worked over the course of the last 2 years for me. Logging I usually do via the SEGGER RTT protocol which should be essentially the same as with the J-LINK, I believe. 🤔 Thanks for your ideas already, I believe we are getting close to the solution. |
Tks @oleo65. The plot thickens... I just tried the v2.5.3 SDK & toolchain, and it still works fine for me when building and flashing from VSCode with a J-Link. I also tried: Building with docker:
And flashing with
Getting logs directly with
With the introduction of the OTA stuff, there's also significant changes in the apps memory address layout (we now have MCUBoot plus 2 app slots, as opposed to a single, bootloaderless app). I wonder if this is confusing your RTT logger. What's the full command you're using for checking out the logger? Maybe I can reproduce with that. Did you double check if your logger works with your early commit in this PR (e.g.: I'm also gonna be away form my computer for the next couple of days, but I'm looking forward to us nailing this down 👌 |
Well I am a little bit smarter now and believe that the failing logging might be related to my debugging probe. Using the latest commit from this branch, I am able to build both variants with and without OTA and flash them to a Interestingly the parasite works with both variants, and gets paired with HA. The logging with my probe still fails for the OTA firmware variant but works for the other. Maybe I need to provide a custom memory address to the Furthermore I was also able to get a successful OTA notification in HA, using the (sadly quite tedious) method described above to provide the firmware via ZHA custom json file. I ended up not flashing, because I simply did not change anything but are confident, that it should have worked. I might be able to test this sometime this week. Then we have another full humanoid E2E test. 😉 |
Tks @oleo65. What's the actual command/software/probe you're using to check the logs? I will try to use OpenOCD/pyOCD too. If you're feeling brave, you could try seeing if you can run a different OTA application (like the light switch), and checking if logging works with that. |
I am using pyocd with For a probe, I am using a nanoDAP probe from MuseLab, in SWD programming mode. |
Okay @oleo65, I managed to reproduce (the same? a different?) crash with With
With With
With the default
With It's possible that we're doing something wrong, but |
@oleo65 I managed to also connect to RTT directly with In terminal 1:
In terminal 2:
In terminal 3:
|
I guess this repo is famous now. 😎 I hope I will have the time this week to give it a try. Thanks for sorting this out. |
Some updates from my E2E tests. I was finally able to get the OTA notification for the boards within HA. 🥳 Over the course of about one week 5 of my devices popped up with updates available. They are all "old" v1.2.0 devices. However I was not able to successfully flash any of them. The update process starts successfully but keeps on canceling randomly after 20-40 minutes at varying percentages. Still, I consider this a big step forward. I increased the parent poll interval to 300 seconds for my parasites to try to conserve more battery power. Spoiler: does not seem to have any effect. This might add more instability to the OTA process, but is only a wild guess. 😊 I suspect that maybe my network is too weak or busy and the OTA connection not stable enough. I plan to do some more testing as soon as there is more time to spare. As a side note: Currently I don't exactly know how to provide different firmwares for the different hardware revisions at the same time to the zigbee coordinator. This might be challenging. |
Hi - love all your work! Thank you a lot! While starting the project, I still have the "old" v1.2.0 devices. I finally switched many of my battery-powered devices to Zigbee and enabled HA to grab all the data. |
I'd say, it depends. 😉 On how easy your deviced could be updated via cable in case you are stuck and on how adventurous you are. From a feature perspective, the main branch and the OTA branch firmware are identical. Having said that, I am running both v1.2.0 and v2.0.0 parasites with the OTA variant from this branch just fine for a few weeks now. A successful OTA update is still pending for me, but mostly due to lack of time for experiments and somewhat flaky implementation from the Home Assisant / ZHA side. I would love to see someone else validate the approach, though. |
This is a first attempt at activating the zigbee ota functionality. All changes are toogled via Kconfig parameter.
Vastly inspired by the zigbee light switch sample from the nrf-toolkit.
This works so far in adding an additional OTA cluster to the device and publishing the flashed firmware version.
I was not able to get a firmware update on the device, but this is most probably due to the quite complicated process of providing the updates to the zigbee controller.
This addresses #128