Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ICD] Re-activate subscription when receiving check-in message if subscription timeouts and schedule a subscription #37481

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

yunhanw-google
Copy link
Contributor

@yunhanw-google yunhanw-google commented Feb 8, 2025

Assume read client fails to create subscription during case session establishement stage or subscription priming stage via SendAutoResubscribeRequest API with ScopedNodeId and goes to resubscribe with back-off algorithm because lit icd has powered off, once the device becomes active, controller receives the corresponding check-in message, client's scheduled subscription cannot be re-activated immediately. Intuitively, it is better that a resubscription can be triggered immeidately, right?

Another scenario is regarding liveness timeout for LIT, currently we block resubscription when the liveness timeout happens with CHIP_ERROR_LIT_SUBSCRIBE_INACTIVE_TIMEOUT, then schedule a resubscription when check-in message is received, but, what if the home network stuck/disrupt for a period, check-in message cannot be sent to client during that period, this blocking behavior may exacerbate subscription recovery delays. The removal of this blocking constraint is hypothesized to mitigate such delays.

Proposed Resolution:

A unified approach is proposed to optimize subscription recovery:
-- The blocking of resubscription logic during auto-subscription timeouts shall be eliminated.
-- Upon reception of a check-in message, immediate subscription activation shall be implemented.
-- Specifically, for LIT ICD devices experiencing timeouts:

  1. The existing subscription shall be terminated with the CHIP_ERROR_LIT_SUBSCRIBE_INACTIVE_TIMEOUT error code, thereby clearing the subscription.
  2. A resubscription process shall be scheduled.
  3. The client state shall be transitioned to InactiveICDSubscription.
    -- Upon receipt of a check-in message, the OnActiveModeNotification event shall be triggered.
  4. The client shall evaluate its current state against InactiveICDSubscription.
  5. If the client is in the InactiveICDSubscription state, the case session shall be cleared, and an immediate resubscription shall be initiated.

To accommodate dynamic transitions between SIT and LIT operating modes:
-- A dedicated attribute, mPeerOperatingMode, shall be introduced to accurately track the peer device's operating mode.
-- The existing mIsPeerLIT attribute shall not be modified during LIT to SIT transitions. This attribute shall solely indicate the device's registration with an ICD token and check-in capability.

Testing

Drafting

@yunhanw-google yunhanw-google requested a review from a team as a code owner February 8, 2025 04:39
Copy link

semanticdiff-com bot commented Feb 8, 2025

Review changes with  SemanticDiff

@yunhanw-google yunhanw-google force-pushed the feature/fix_icd_readclient branch 2 times, most recently from 5423fec to 84eeb84 Compare February 8, 2025 06:35
@yunhanw-google yunhanw-google changed the title [ICD] Close with CHIP_ERROR_LIT_SUBSCRIBE_INACTIVE_TIMEOUT if OnResponseTimeout happens during subscription priming stage [ICD] Close with CHIP_ERROR_LIT_SUBSCRIBE_INACTIVE_TIMEOUT if OnResponseTimeout happens after sending first subscription request Feb 8, 2025
@yunhanw-google yunhanw-google force-pushed the feature/fix_icd_readclient branch from 84eeb84 to 0a45f2b Compare February 8, 2025 07:49
@yunhanw-google yunhanw-google changed the title [ICD] Close with CHIP_ERROR_LIT_SUBSCRIBE_INACTIVE_TIMEOUT if OnResponseTimeout happens after sending first subscription request [ICD] Activate subscription when receiving check-in message if subscription client stuck in initialized state. Feb 8, 2025
@yunhanw-google yunhanw-google force-pushed the feature/fix_icd_readclient branch from 0a45f2b to 64037cc Compare February 8, 2025 08:07
@yunhanw-google yunhanw-google changed the title [ICD] Activate subscription when receiving check-in message if subscription client stuck in initialized state. [ICD] Re-activate subscription when receiving check-in message if subscription client stuck in initialized state. Feb 8, 2025
@yunhanw-google yunhanw-google force-pushed the feature/fix_icd_readclient branch 2 times, most recently from 17bb029 to 82d1fda Compare February 8, 2025 08:29
Copy link

github-actions bot commented Feb 8, 2025

PR #37481: Size comparison from ed1babf to 82d1fda

Full report (1 build for stm32)
platform target config section ed1babf 82d1fda change % change
stm32 light STM32WB5MM-DK FLASH 483024 483040 16 0.0
RAM 144688 144688 0 0.0

// If we are in Idle state, it means previously device cannot be reached
// If we are not in the above, simply do nothing.
// If we are in the above, we should trigger immediate resubscription.
VerifyOrReturn(IsInactiveICDSubscription() || IsIdle());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is trying to handle the case where:

  1. SendAutoResubscribeRequest was sent
  2. It was the version that takes a ScopedNodeId, not an existing session.
  3. Our attempt to establish a session has not succeeded yet.

Right?

If that's accurate, it seems to me that we should have a state other than "idle" that describes this situation, and we should test for that state here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the 1/2/3 is correct description. I just think through multiple situations as below.

Assume read client fails to create subscription during case session establishement stage or subscription priming stage via SendAutoResubscribeRequest API with ScopedNodeId and goes to resubscribe with back-off algorithm because lit icd has powered off, once the device becomes active, controller receives the corresponding check-in message, client's scheduled subscription cannot be re-activated immediately. Intuitively, it is better that a resubscription can be triggered immeidately, right?

Another scenario is regarding liveness timeout for LIT, currently we block resubscription when the liveness timeout happens with CHIP_ERROR_LIT_SUBSCRIBE_INACTIVE_TIMEOUT, then schedule a resubscription when check-in message is received, but, what if the home network stuck/disrupt for a period, check-in message cannot be sent to client during that period, this blocking behavior may exacerbate subscription recovery delays. The removal of this blocking constraint is hypothesized to mitigate such delays.

Proposed Resolution:

A unified approach is proposed to optimize subscription recovery:
-- The blocking of resubscription logic during auto-subscription timeouts shall be eliminated.
-- Upon reception of a check-in message, immediate subscription activation shall be implemented.
-- Specifically, for LIT ICD devices experiencing timeouts:

The existing subscription shall be terminated with the CHIP_ERROR_LIT_SUBSCRIBE_INACTIVE_TIMEOUT error code, thereby clearing the subscription.
A resubscription process shall be scheduled.
The client state shall be transitioned to InactiveICDSubscription.
-- Upon receipt of a check-in message, the OnActiveModeNotification event shall be triggered.
The client shall evaluate its current state against InactiveICDSubscription.
If the client is in the InactiveICDSubscription state, the case session shall be cleared, and an immediate resubscription shall be initiated.
To accommodate dynamic transitions between SIT and LIT operating modes:
-- A dedicated attribute, mPeerOperatingMode, shall be introduced to accurately track the peer device's operating mode.
-- The existing mIsPeerLIT attribute shall not be modified during LIT to SIT transitions. This attribute shall solely indicate the device's registration with an ICD token and check-in capability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the point of the "blocking resubscription" behavior was that for a LIT ICD the chance of a resubscribe attempt not triggered by a checkin succeeding is 0, because the sleepy is asleep. So it's just a waste of network packets....

@yunhanw-google yunhanw-google force-pushed the feature/fix_icd_readclient branch from 82d1fda to c770997 Compare February 13, 2025 02:17
Copy link

github-actions bot commented Feb 13, 2025

PR #37481: Size comparison from 92f9f0b to c770997

Full report (10 builds for cc32xx, nrfconnect, qpg, stm32, tizen)
platform target config section 92f9f0b c770997 change % change
cc32xx air-purifier CC3235SF_LAUNCHXL FLASH 538894 538966 72 0.0
RAM 205208 205208 0 0.0
lock CC3235SF_LAUNCHXL FLASH 572766 572838 72 0.0
RAM 205360 205360 0 0.0
nrfconnect all-clusters-app nrf52840dk_nrf52840 FLASH 907464 907424 -40 -0.0
RAM 142411 142411 0 0.0
nrf7002dk_nrf5340_cpuapp FLASH 901920 901940 20 0.0
RAM 124755 124755 0 0.0
all-clusters-minimal-app nrf52840dk_nrf52840 FLASH 846084 846084 0 0.0
RAM 141339 141339 0 0.0
qpg lighting-app qpg6105+debug FLASH 662340 662388 48 0.0
RAM 105220 105220 0 0.0
lock-app qpg6105+debug FLASH 620136 620176 40 0.0
RAM 99664 99664 0 0.0
stm32 light STM32WB5MM-DK FLASH 459736 459776 40 0.0
RAM 141568 141568 0 0.0
tizen all-clusters-app arm unknown 5104 5104 0 0.0
FLASH 1751708 1751692 -16 -0.0
RAM 93508 93508 0 0.0
chip-tool-ubsan arm unknown 11396 11396 0 0.0
FLASH 18683110 18683214 104 0.0
RAM 8181084 8181140 56 0.0

@yunhanw-google yunhanw-google changed the title [ICD] Re-activate subscription when receiving check-in message if subscription client stuck in initialized state. [ICD] Re-activate subscription when receiving check-in message if subscription timeouts and schedule a subscription Feb 13, 2025
Copy link

github-actions bot commented Feb 13, 2025

PR #37481: Size comparison from 92f9f0b to a103e4f

Full report (72 builds for bl602, bl702, bl702l, cc13x4_26x4, cc32xx, cyw30739, efr32, esp32, linux, nrfconnect, nxp, psoc6, qpg, stm32, telink, tizen)
platform target config section 92f9f0b a103e4f change % change
bl602 lighting-app bl602+mfd+littlefs+rpc FLASH 1095960 1096004 44 0.0
RAM 94906 94906 0 0.0
bl702 lighting-app bl702+eth FLASH 652080 652124 44 0.0
RAM 33633 33633 0 0.0
bl702+wifi FLASH 828572 828616 44 0.0
RAM 22341 22341 0 0.0
bl706+mfd+rpc+littlefs FLASH 1061718 1061762 44 0.0
RAM 32285 32285 0 0.0
bl702l contact-sensor-app bl702l+mfd+littlefs FLASH 892592 892636 44 0.0
RAM 26912 26912 0 0.0
lighting-app bl702l+mfd+littlefs FLASH 975962 976006 44 0.0
RAM 24752 24752 0 0.0
cc13x4_26x4 lighting-app LP_EM_CC1354P10_6 FLASH 815204 815212 8 0.0
RAM 120352 120352 0 0.0
lock-ftd LP_EM_CC1354P10_6 FLASH 823832 823840 8 0.0
RAM 125360 125360 0 0.0
pump-app LP_EM_CC1354P10_6 FLASH 770992 771064 72 0.0
RAM 113820 113820 0 0.0
pump-controller-app LP_EM_CC1354P10_6 FLASH 755260 755332 72 0.0
RAM 114028 114028 0 0.0
cc32xx air-purifier CC3235SF_LAUNCHXL FLASH 538894 538966 72 0.0
RAM 205208 205208 0 0.0
lock CC3235SF_LAUNCHXL FLASH 572766 572838 72 0.0
RAM 205360 205360 0 0.0
cyw30739 light CYW30739B2-P5-EVK-01 unknown 2040 2040 0 0.0
FLASH 656501 656565 64 0.0
RAM 75420 75420 0 0.0
CYW30739B2-P5-EVK-02 unknown 2040 2040 0 0.0
FLASH 676361 676425 64 0.0
RAM 78060 78060 0 0.0
CYW30739B2-P5-EVK-03 unknown 2040 2040 0 0.0
FLASH 676361 676425 64 0.0
RAM 78060 78060 0 0.0
CYW930739M2EVB-02 unknown 2040 2040 0 0.0
FLASH 633285 633349 64 0.0
RAM 70488 70488 0 0.0
light-switch CYW30739B2-P5-EVK-01 unknown 2040 2040 0 0.0
FLASH 616117 616093 -24 -0.0
RAM 71532 71532 0 0.0
CYW30739B2-P5-EVK-02 unknown 2040 2040 0 0.0
FLASH 635753 635729 -24 -0.0
RAM 74076 74076 0 0.0
CYW30739B2-P5-EVK-03 unknown 2040 2040 0 0.0
FLASH 635753 635729 -24 -0.0
RAM 74076 74076 0 0.0
lock CYW30739B2-P5-EVK-01 unknown 2040 2040 0 0.0
FLASH 635621 635693 72 0.0
RAM 74540 74540 0 0.0
CYW30739B2-P5-EVK-02 unknown 2040 2040 0 0.0
FLASH 655337 655409 72 0.0
RAM 77084 77084 0 0.0
CYW30739B2-P5-EVK-03 unknown 2040 2040 0 0.0
FLASH 655337 655409 72 0.0
RAM 77084 77084 0 0.0
thermostat CYW30739B2-P5-EVK-01 unknown 2040 2040 0 0.0
FLASH 611969 612033 64 0.0
RAM 68628 68628 0 0.0
CYW30739B2-P5-EVK-02 unknown 2040 2040 0 0.0
FLASH 631829 631893 64 0.0
RAM 71268 71268 0 0.0
CYW30739B2-P5-EVK-03 unknown 2040 2040 0 0.0
FLASH 631829 631893 64 0.0
RAM 71268 71268 0 0.0
efr32 lock-app BRD4187C FLASH 937264 937304 40 0.0
RAM 159908 159908 0 0.0
BRD4338a FLASH 730592 730648 56 0.0
RAM 234720 234720 0 0.0
window-app BRD4187C FLASH 1029856 1029928 72 0.0
RAM 128012 128012 0 0.0
esp32 all-clusters-app c3devkit DRAM 97312 97312 0 0.0
FLASH 1581680 1581632 -48 -0.0
IRAM 83820 83820 0 0.0
m5stack DRAM 116100 116100 0 0.0
FLASH 1549606 1549554 -52 -0.0
IRAM 117039 117039 0 0.0
linux air-purifier-app debug unknown 4760 4760 0 0.0
FLASH 2708747 2708451 -296 -0.0
RAM 132784 132784 0 0.0
all-clusters-app debug unknown 5568 5568 0 0.0
FLASH 5975210 5975054 -156 -0.0
RAM 531600 531600 0 0.0
all-clusters-minimal-app debug unknown 5464 5464 0 0.0
FLASH 5322754 5322458 -296 -0.0
RAM 242712 242712 0 0.0
bridge-app debug unknown 5480 5480 0 0.0
FLASH 4681398 4681102 -296 -0.0
RAM 221448 221448 0 0.0
chip-tool debug unknown 6120 6120 0 0.0
FLASH 13098960 13098804 -156 -0.0
RAM 596578 596578 0 0.0
chip-tool-ipv6only arm64 unknown 21816 21816 0 0.0
FLASH 11162224 11162064 -160 -0.0
RAM 648256 648256 0 0.0
fabric-admin debug unknown 5808 5808 0 0.0
FLASH 11387977 11387821 -156 -0.0
RAM 596362 596362 0 0.0
fabric-bridge-app debug unknown 4736 4736 0 0.0
FLASH 4506720 4506424 -296 -0.0
RAM 208632 208632 0 0.0
fabric-sync debug unknown 4976 4976 0 0.0
FLASH 5612965 5612821 -144 -0.0
RAM 483504 483504 0 0.0
lighting-app debug+rpc+ui unknown 6152 6152 0 0.0
FLASH 5484513 5484513 0 0.0
RAM 225392 225392 0 0.0
lock-app debug unknown 5416 5416 0 0.0
FLASH 4730794 4730498 -296 -0.0
RAM 207696 207696 0 0.0
ota-provider-app debug unknown 4776 4776 0 0.0
FLASH 4359828 4359532 -296 -0.0
RAM 201336 201336 0 0.0
ota-requestor-app debug unknown 4728 4728 0 0.0
FLASH 4497204 4496908 -296 -0.0
RAM 205920 205920 0 0.0
shell debug unknown 4256 4256 0 0.0
FLASH 3005468 3005276 -192 -0.0
RAM 160552 160552 0 0.0
thermostat-no-ble arm64 unknown 9512 9512 0 0.0
FLASH 4096488 4096312 -176 -0.0
RAM 246024 246024 0 0.0
tv-app debug unknown 5744 5744 0 0.0
FLASH 5951893 5951733 -160 -0.0
RAM 606904 606904 0 0.0
tv-casting-app debug unknown 5320 5320 0 0.0
FLASH 11271725 11271565 -160 -0.0
RAM 710864 710864 0 0.0
nrfconnect all-clusters-app nrf52840dk_nrf52840 FLASH 907464 907424 -40 -0.0
RAM 142411 142411 0 0.0
nrf7002dk_nrf5340_cpuapp FLASH 901920 901940 20 0.0
RAM 124755 124755 0 0.0
all-clusters-minimal-app nrf52840dk_nrf52840 FLASH 846084 846084 0 0.0
RAM 141339 141339 0 0.0
nxp contact k32w0+release FLASH 584752 584752 0 0.0
RAM 70876 70876 0 0.0
mcxw71+release FLASH 600208 600280 72 0.0
RAM 63096 63096 0 0.0
light k32w0+release FLASH 611076 611076 0 0.0
RAM 70268 70268 0 0.0
k32w1+release FLASH 685512 685576 64 0.0
RAM 48680 48680 0 0.0
lock mcxw71+release FLASH 749024 749096 72 0.0
RAM 67500 67500 0 0.0
psoc6 all-clusters cy8ckit_062s2_43012 FLASH 1646756 1646748 -8 -0.0
RAM 211576 211576 0 0.0
all-clusters-minimal cy8ckit_062s2_43012 FLASH 1553524 1553596 72 0.0
RAM 208392 208392 0 0.0
light cy8ckit_062s2_43012 FLASH 1439196 1439268 72 0.0
RAM 197144 197144 0 0.0
lock cy8ckit_062s2_43012 FLASH 1467412 1467484 72 0.0
RAM 224704 224704 0 0.0
qpg lighting-app qpg6105+debug FLASH 662340 662388 48 0.0
RAM 105220 105220 0 0.0
lock-app qpg6105+debug FLASH 620136 620176 40 0.0
RAM 99664 99664 0 0.0
stm32 light STM32WB5MM-DK FLASH 459736 459776 40 0.0
RAM 141568 141568 0 0.0
telink bridge-app tl7218x FLASH 665226 665232 6 0.0
RAM 90828 90828 0 0.0
contact-sensor-app tlsr9528a_retention FLASH 621908 621914 6 0.0
RAM 31488 31488 0 0.0
light-app-ota-shell-factory-data tl3218x FLASH 745418 745420 2 0.0
RAM 40496 40496 0 0.0
tl7218x FLASH 753974 753980 6 0.0
RAM 97632 97632 0 0.0
light-switch-app-ota-compress-lzma-factory-data tl7218x_retention FLASH 680678 680688 10 0.0
RAM 52192 52192 0 0.0
light-switch-app-ota-compress-lzma-shell-factory-data tlsr9528a FLASH 709240 709250 10 0.0
RAM 73400 73400 0 0.0
lighting-app-ota-factory-data tlsr9118bdk40d FLASH 600796 600802 6 0.0
RAM 138912 138912 0 0.0
lighting-app-ota-rpc-factory-data-4mb tlsr9518adk80d FLASH 788928 788934 6 0.0
RAM 96488 96488 0 0.0
tizen all-clusters-app arm unknown 5104 5104 0 0.0
FLASH 1751708 1751692 -16 -0.0
RAM 93508 93508 0 0.0
chip-tool-ubsan arm unknown 11396 11396 0 0.0
FLASH 18683110 18683214 104 0.0
RAM 8181084 8181140 56 0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants