Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modules: memfault: Add automatic sending of coredumps over LTE #20040

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

simensrostad
Copy link
Contributor

@simensrostad simensrostad commented Jan 22, 2025

Add automatic sending of coredumps over LTE:

  • Add new file memfault_lte_coredump.c that uses LTE cereg and PDN to determine if the device is connected to the network.

    If connected, the layer will trigger sending of a stored coredump. Library is implemented with retry logic and backoff.

  • Add overlay file to SLM that enables Memfault features

  • Update Memfault sample to use the new coredump feature.

  • Add missing features to Memfault sample

  • Enable assertions by default in SLM, there is no reason why it should be enabled as long as there is space.

Implemented according to state diagram found here:
https://miro.com/app/board/uXjVLrI-A64=/

@simensrostad simensrostad requested review from a team as code owners January 22, 2025 14:47
@github-actions github-actions bot added doc-required PR must not be merged without tech writer approval. changelog-entry-required Update changelog before merge. Remove label if entry is not needed or already added. labels Jan 22, 2025
@NordicBuilder
Copy link
Contributor

NordicBuilder commented Jan 22, 2025

CI Information

To view the history of this post, clich the 'edited' button above
Build number: 17

Inputs:

Sources:

sdk-nrf: PR head: 3c1590774010143b5267b85f6de4227576c7bf44

more details

sdk-nrf:

PR head: 3c1590774010143b5267b85f6de4227576c7bf44
merge base: 261cfb7ea05bd77c1c5c7c9446217f62699f272c
target head (main): 261cfb7ea05bd77c1c5c7c9446217f62699f272c
Diff

Github labels

Enabled Name Description
ci-disabled Disable the ci execution
ci-all-test Run all of ci, no test spec filtering will be done
ci-force-downstream Force execution of downstream even if twister fails
ci-run-twister Force run twister
ci-run-zephyr-twister Force run zephyr twister
List of changed files detected by CI (18)
applications
│  ├── serial_lte_modem
│  │  ├── doc
│  │  │  │ slm_description.rst
│  │  ├── overlay-memfault.conf
│  │  ├── prj.conf
│  │  ├── sample.yaml
│  │  ├── src
│  │  │  ├── lwm2m_carrier
│  │  │  │  │ slm_at_carrier.c
│  │  │  │ slm_at_commands.c
doc
│  ├── nrf
│  │  ├── libraries
│  │  │  ├── debug
│  │  │  │  │ memfault_ncs.rst
│  │  ├── releases_and_maturity
│  │  │  ├── releases
│  │  │  │  │ release-notes-changelog.rst
modules
│  ├── memfault-firmware-sdk
│  │  ├── CMakeLists.txt
│  │  ├── Kconfig
│  │  │ memfault_lte_coredump.c
samples
│  ├── debug
│  │  ├── memfault
│  │  │  ├── boards
│  │  │  │  ├── nrf9151dk_nrf9151_ns.conf
│  │  │  │  ├── nrf9160dk_nrf9160_ns.conf
│  │  │  │  ├── nrf9161dk_nrf9161_ns.conf
│  │  │  │  ├── thingy91_nrf9160_ns.conf
│  │  │  │  │ thingy91x_nrf9151_ns.conf
│  │  │  ├── prj.conf
│  │  │  ├── src
│  │  │  │  │ main.c

Outputs:

Toolchain

Version: 342151af73
Build docker image: docker-dtr.nordicsemi.no/sw-production/ncs-build:342151af73_912848a074

Test Spec & Results: ✅ Success; ❌ Failure; 🟠 Queued; 🟡 Progress; ◻️ Skipped; ⚠️ Quarantine

  • ◻️ Toolchain - Skipped: existing toolchain is used
  • ❌ Build twister
    • sdk-nrf test count: 69
  • ❌ Integration tests
    • ❌ test-fw-nrfconnect-nrf-iot_serial_lte_modem
Disabled integration tests
    • desktop52_verification
    • doc-internal
    • test_ble_nrf_config
    • test-fw-nrfconnect-apps
    • test-fw-nrfconnect-ble_mesh
    • test-fw-nrfconnect-ble_samples
    • test-fw-nrfconnect-boot
    • test-fw-nrfconnect-chip
    • test-fw-nrfconnect-fem
    • test-fw-nrfconnect-nfc
    • test-fw-nrfconnect-nrf-iot_lwm2m
    • test-fw-nrfconnect-nrf-iot_mosh
    • test-fw-nrfconnect-nrf-iot_positioning
    • test-fw-nrfconnect-nrf-iot_samples
    • test-fw-nrfconnect-nrf-iot_thingy91
    • test-fw-nrfconnect-nrf-iot_zephyr_lwm2m
    • test-fw-nrfconnect-nrf_crypto
    • test-fw-nrfconnect-ps
    • test-fw-nrfconnect-rpc
    • test-fw-nrfconnect-rs
    • test-fw-nrfconnect-tfm
    • test-fw-nrfconnect-thread
    • test-fw-nrfconnect-zigbee
    • test-low-level
    • test-sdk-audio
    • test-sdk-dfu
    • test-sdk-find-my
    • test-sdk-mcuboot
    • test-sdk-pmic-samples
    • test-sdk-sidewalk
    • test-sdk-wifi
    • test-secdom-samples-public

Note: This message is automatically posted and updated by the CI

@NordicBuilder
Copy link
Contributor

You can find the documentation preview for this PR at this link.

Note: This comment is automatically posted by the Documentation Publish GitHub Action.

@peknis
Copy link
Contributor

peknis commented Jan 23, 2025

Worth a changelog entry?

Copy link
Contributor

@MarkusLassila MarkusLassila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.nordicsemi.com/bundle/ncs-latest/page/nrf/libraries/debug/memfault_ncs.html states:

When building applications with Trusted Firmware-M (TF-M), the faults resulting from memory access in secure regions are not caught by Memfault’s fault handler. Instead, they are handled by TF-M. This means that those faults are not reported to the Memfault platform.

Has this changed? If yes, then the document should be updated. If not, then the use-cases for this would be quite limited.

@jtguggedal
Copy link
Contributor

https://docs.nordicsemi.com/bundle/ncs-latest/page/nrf/libraries/debug/memfault_ncs.html states:

When building applications with Trusted Firmware-M (TF-M), the faults resulting from memory access in secure regions are not caught by Memfault’s fault handler. Instead, they are handled by TF-M. This means that those faults are not reported to the Memfault platform.

Has this changed? If yes, then the document should be updated. If not, then the use-cases for this would be quite limited.

I think it's still true that faults originating in secure domain will not be forwarded to Memfault. CONFIG_TFM_ALLOW_NON_SECURE_FAULT_HANDLING allows handling of non-secure faults only, as far as I understand, but you know a lot more about this topic than me, Markus 😬

@MarkusLassila
Copy link
Contributor

MarkusLassila commented Jan 23, 2025

Has this changed? If yes, then the document should be updated. If not, then the use-cases for this would be quite limited.

I think it's still true that faults originating in secure domain will not be forwarded to Memfault. CONFIG_TFM_ALLOW_NON_SECURE_FAULT_HANDLING allows handling of non-secure faults only, as far as I understand, but you know a lot more about this topic than me, Markus 😬

Ah, you are correct. It mentions the secure regions. I was looking at this through lenses of having attempted to collect a crash dump with memfault with TF-M (for non-secure), in which I was ultimately not successful, but I take your word that it should work for NS.

Edit: I got the dump as text, but was not able to get it working.

@simensrostad simensrostad force-pushed the memfault-slm-app branch 2 times, most recently from 0c758c4 to 1b839b2 Compare January 23, 2025 11:24
@simensrostad simensrostad requested a review from a team as a code owner January 23, 2025 11:24
@github-actions github-actions bot removed the changelog-entry-required Update changelog before merge. Remove label if entry is not needed or already added. label Jan 23, 2025
@simensrostad
Copy link
Contributor Author

@MarkusLassila I've added more information about the new automatic coredump send feature. However, I have not included details on how Memfault stores and handles coredumps, as that falls outside the scope of this PR.

I've updated the changelog for SLM and the Memfault integration.
@peknis Could you please have a look?

applications/serial_lte_modem/doc/slm_description.rst Outdated Show resolved Hide resolved
doc/nrf/libraries/debug/memfault_ncs.rst Outdated Show resolved Hide resolved
doc/nrf/libraries/debug/memfault_ncs.rst Outdated Show resolved Hide resolved
doc/nrf/libraries/debug/memfault_ncs.rst Outdated Show resolved Hide resolved
doc/nrf/libraries/debug/memfault_ncs.rst Outdated Show resolved Hide resolved
doc/nrf/libraries/debug/memfault_ncs.rst Outdated Show resolved Hide resolved
doc/nrf/libraries/debug/memfault_ncs.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@MarkusLassila MarkusLassila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving SLM parts

Copy link
Contributor

@nordicjm nordicjm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commits need squashing

modules/memfault-firmware-sdk/CMakeLists.txt Show resolved Hide resolved
modules/memfault-firmware-sdk/Kconfig Show resolved Hide resolved
@simensrostad simensrostad force-pushed the memfault-slm-app branch 2 times, most recently from 52ab5ff to 21783e0 Compare January 27, 2025 14:51
doc/nrf/libraries/debug/memfault_ncs.rst Outdated Show resolved Hide resolved
doc/nrf/libraries/debug/memfault_ncs.rst Outdated Show resolved Hide resolved
doc/nrf/libraries/debug/memfault_ncs.rst Outdated Show resolved Hide resolved
doc/nrf/libraries/debug/memfault_ncs.rst Outdated Show resolved Hide resolved
@simensrostad simensrostad force-pushed the memfault-slm-app branch 2 times, most recently from fb524eb to f1596fe Compare January 30, 2025 13:22
Add automatic sending of coredumps over LTE:
 - Add new file memfault_lte_coredump.c that uses LTE cereg and PDN
   to determine if the device is connected to the network.

   If connected, the layer will trigger sending of a stored coredump.
   Library is implemented with retry logic and backoff.

 - Add overlay file to SLM that enables Memfault features
 - Update Memfault sample to use the new coredump feature.
 - Add missing features to Memfault sample
 - Enable assertions by default in SLM, there is no reason why it should
    be enabled as long as there is space.

Signed-off-by: Simen S. Røstad <[email protected]>
@simensrostad simensrostad requested a review from a team as a code owner January 31, 2025 11:55
Delay coredump posting to Memfault to avoid conflicts with the application's
TLS connection setup that occurs typically LTE network is obtained.
The delay duration is defined by CONFIG_MEMFAULT_NCS_POST_COREDUMP_RETRY_INTERVAL_SECONDS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-required PR must not be merged without tech writer approval.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants