Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OTA reflashing support on TX2 #113

Merged

Conversation

atharvanan1
Copy link
Contributor

@atharvanan1 atharvanan1 commented Jun 1, 2021

Hey,

I tried to integrate the OTA setup highlighted here - https://github.com/OE4T/meta-tegra/wiki/Over-the-air-reflashing-process

Setup Information:
Pre-update Software built from - https://github.com/madisongh/test-distro/tree/warrior
sysinstall-upgrader-initramfs with modified tegra-sysinstall-tools as per https://github.com/BoulderAI/tegra-sysinstall/tree/add-UDA-support
Update Software built from - this PR


sysinstall-partition-layout pipes out partition_table file as

1,APP,25376768,,,,
2,mts-bootpack,8192,,,,
3,mts-bootpack_b,8192,,,,
4,cpu-bootloader,1024,,,,
5,cpu-bootloader_b,1024,,,,
6,bootloader-dtb,1024,,,,
7,bootloader-dtb_b,1024,,,,
8,secure-os,6144,,,,
9,secure-os_b,6144,,,,
10,eks,4096,,,,
11,adsp-fw,8192,,,,
12,adsp-fw_b,8192,,,,
13,bpmp-fw,1208,,,,
14,bpmp-fw_b,1208,,,,
15,bpmp-fw-dtb,2048,,,,
16,bpmp-fw-dtb_b,2048,,,,
17,sce-fw,4096,,,,
18,sce-fw_b,4096,,,,
19,sc7,12288,,,,
20,sc7_b,12288,,,,
21,fusebypass,4096,,,,
22,BMP,262144,,,,
23,BMP_b,262144,,,,
24,recovery,129024,,,,
25,recovery-dtb,1024,,,,
26,kernel-bootctrl,512,,,,
27,kernel-bootctrl_b,512,,,,
28,kernel,163840,,,,
29,kernel_b,163840,,,,
30,kernel-dtb,1024,,,,
31,kernel-dtb_b,1024,,,,
32,RECROOTFS,614400,,,,
33,APP_b,25376768,,,,
34,UDA,REMAIN,,,,

Below is the log from after partitioning gpt disk

Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): BE2DDE25-23BF-49A3-BA2F-6807D1016D24
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 61071326
Partitions will be aligned on 2-sector boundaries
Total free space is 0 sectors (0 bytes)

Number  Start (sector)    End (sector)  Size       Code  Name
1              34        25376801   12.1 GiB    0700  APP
2        25376802        25384993   4.0 MiB     0700  mts-bootpack
3        25384994        25393185   4.0 MiB     0700  mts-bootpack_b
4        25393186        25394209   512.0 KiB   0700  cpu-bootloader
5        25394210        25395233   512.0 KiB   0700  cpu-bootloader_b
6        25395234        25396257   512.0 KiB   0700  bootloader-dtb
7        25396258        25397281   512.0 KiB   0700  bootloader-dtb_b
8        25397282        25403425   3.0 MiB     0700  secure-os
9        25403426        25409569   3.0 MiB     0700  secure-os_b
10        25409570        25413665   2.0 MiB     0700  eks
11        25413666        25421857   4.0 MiB     0700  adsp-fw
12        25421858        25430049   4.0 MiB     0700  adsp-fw_b
13        25430050        25431257   604.0 KiB   0700  bpmp-fw
14        25431258        25432465   604.0 KiB   0700  bpmp-fw_b
15        25432466        25434513   1024.0 KiB  0700  bpmp-fw-dtb
16        25434514        25436561   1024.0 KiB  0700  bpmp-fw-dtb_b
17        25436562        25440657   2.0 MiB     0700  sce-fw
18        25440658        25444753   2.0 MiB     0700  sce-fw_b
19        25444754        25457041   6.0 MiB     0700  sc7
20        25457042        25469329   6.0 MiB     0700  sc7_b
21        25469330        25473425   2.0 MiB     0700  fusebypass
22        25473426        25735569   128.0 MiB   0700  BMP
23        25735570        25997713   128.0 MiB   0700  BMP_b
24        25997714        26126737   63.0 MiB    0700  recovery
25        26126738        26127761   512.0 KiB   0700  recovery-dtb
26        26127762        26128273   256.0 KiB   0700  kernel-bootctrl
27        26128274        26128785   256.0 KiB   0700  kernel-bootctrl_b
28        26128786        26292625   80.0 MiB    0700  kernel
29        26292626        26456465   80.0 MiB    0700  kernel_b
30        26456466        26457489   512.0 KiB   0700  kernel-dtb
31        26457490        26458513   512.0 KiB   0700  kernel-dtb_b
32        26458514        27072913   300.0 MiB   0700  RECROOTFS
33        27072914        52449681   12.1 GiB    0700  APP_b
34        52449682        61071326   4.1 GiB     0700  UDA
[  244.096703]  mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20 p21 p22 p23 p24 p25 p26 p27 p28 p29 p30 p31 p32 p33 p34
[  244.172446]  mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20 p21 p22 p23 p24 p25 p26 p27 p28 p29 p30 p31 p32 p33 p34
NOTE: secure boot not enabled, skipping partition encryption
UDA (/dev/mmcblk0p34): formatting (-i 4096)...[OK]
[  249.299200] EXT4-fs (mmcblk0p34): mounted filesystem with ordered data mode. Opts: (null)
Secure boot not enabled; skipping machine ID setting
Installing rootfs to /dev/mmcblk0p1...
[  253.464004] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)
[  267.262992] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)
APP: clean, 3835/3172288 files, 320570/3172096 blocks
Installing rootfs to /dev/mmcblk0p33...
[  271.268307] EXT4-fs (mmcblk0p33): mounted filesystem with ordered data mode. Opts: (null)
[  283.591877] EXT4-fs (mmcblk0p33): mounted filesystem with ordered data mode. Opts: (null)
APP_b: clean, 3835/3172288 files, 320569/3172096 blocks
[  283.718564] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)
Updating bootloader...
Clearing boot device /dev/mmcblk0boot0... /dev/mmcblk0boot1... [OK]
Processing bpmp-fw... [OK]
Processing bpmp-fw_b... [OK]
Processing sce-fw... [OK]
Processing sce-fw_b... [OK]
Processing cpu-bootloader... [OK]
Processing cpu-bootloader_b... [OK]
Processing spe-fw... [OK]
Processing spe-fw_b... [OK]
Processing secure-os... [OK]
Processing secure-os_b... [OK]
Processing adsp-fw... [OK]
Processing adsp-fw_b... [OK]
Processing badpage-fw... [OK]
Processing badpage-fw_b... [OK]
Processing secure-os... [OK]
Processing secure-os_b... [OK]
Processing BMP... [OK]
Processing BMP_b... [OK]
Processing mts-bootpack... [OK]
Processing mts-bootpack_b... [OK]
Processing mts-preboot... [OK]
Processing mts-preboot_b... [OK]
Processing sc7... [OK]
Processing sc7_b... [OK]
Processing bpmp-fw-dtb... [OK]
Processing bpmp-fw-dtb_b... [OK]
Processing bootloader-dtb... [OK]
Processing bootloader-dtb_b... [OK]
Processing VER... [OK]
Processing VER_b... [OK]
Processing MB1_BCT... [OK]
Processing MB1_BCT_b... [OK]
Processing kernel... [OK]
Processing kernel_b... [OK]
Processing kernel-dtb... [OK]
Processing kernel-dtb_b... [OK]
Processing mb2... [OK]
Processing mb2_b... [OK]
Processing BCT... [offset=3584]...[offset=16384]...[offset=0]...[OK]
Processing mb1... [OK]
Processing mb1_b... [OK]
Processing eks... [OK]
Processing dram-ecc-fw... [OK]
Slot 0 marked as active for next boot
PASS: successful installation of demo-image-base_3.1-9eadaec-atharva

Upon reboot the bootloader fails with this:

Aux Info = 0x0, Reason = 0xd
[0�[0000.162] C> ERROR: Highest Layer Module = 0x40, Lowest Layer Module = 0x40,
Aux Info = 0x0, Reason = 0xd
[0[0000.162] C> ERROR: Highest Layer Module = 0x40, Lowest Layer Module = 0x40,
Aux Info = 0x0, Reason = 0xd
[0�[0000.151] C> ERROR: Highest Layer Module = 0x40, Lowest Layer Module = 0x40,
Aux Info = 0x0, Reason = 0xd
[0�[0000.140] C> ERROR: Highest Layer Module = 0x40, Lowest Layer Module = 0x40,
Aux Info = 0x0, Reason = 0xd
[0�[0000.140] C> ERROR: Highest Layer Module = 0x40, Lowest Layer Module = 0x40,
Aux Info = 0x0, Reason = 0xd
[0�[0000.140] C> ERROR: Highest Layer Module = 0x40, Lowest Layer Module = 0x40,
Aux Info = 0x0, Reason = 0xd
[0�[0000.154] C> ERROR: Highest Layer Module = 0x40, Lowest Layer Module = 0x40,
Aux Info = 0x0, Reason = 0xd
[0�[0000.151] C> ERROR: Highest Layer Module = 0x40, Lowest Layer Module = 0x40,
Aux Info = 0x0, Reason = 0xd
[0�[0000.140] C> ERROR: Highest Layer Module = 0x40, Lowest Layer Module = 0x40,
Aux Info = 0x0, Reason = 0xd
[0�[0000.140] C> ERROR: Highest Layer Module = 0x40, Lowest Layer Module = 0x40,
Aux Info = 0x0, Reason = 0xd
[0[0000.140] C> ERROR: Highest Layer Module = 0x40, Lowest Layer Module = 0x40,
Aux Info = 0x0, Reason = 0xd
[0�[0000.140] C> ERROR: Highest Layer Module = 0x40, Lowest Layer Module = 0x40,
Aux Info = 0x0, Reason = 0xd
[0�[0000.140] C> ERROR: Highest Layer Module = 0x40, Lowest Layer Module = 0x40,
Aux Info = 0x0, Reason = 0xd
[0�[0000.062] C> Failed to update boot chain
[0000.066] C> ERROR: Highest Layer Module = 0x54, Lowest Layer Module = 0x54,
Aux Info = 0x1, Reason = 0x2```

@atharvanan1 atharvanan1 force-pushed the dunfell-l4t-r32.4.3+ota-components branch from 9eadaec to 8720d27 Compare June 1, 2021 21:56
Copy link
Member

@ichergui ichergui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @atharvanan1
Thanks for sharing this PR, I guess that you tested this only with Jetson TX2
Do you need support/help to validate this with other Jetson's platform ?

I'm seeing some errors in the last traces ? Do you need help to fix these errors ?

@madisongh
Copy link
Member

@atharvanan1 Looks like this is due to a difference in the NVIDIA flashing tools between L4T R32.4.3 and R32.5.x. If I directly flash your R32.4.3 image onto my TX2, this is what I see for the eMMC partitioning:

Number  Start (sector)    End (sector)  Size       Code  Name
   1              40        25376807   12.1 GiB    0700  APP
   2        25376808        25384999   4.0 MiB     0700  mts-bootpack
   3        25385000        25393191   4.0 MiB     0700  mts-bootpack_b
   4        25393192        25394215   512.0 KiB   0700  cpu-bootloader
   5        25394216        25395239   512.0 KiB   0700  cpu-bootloader_b
   6        25395240        25396263   512.0 KiB   0700  bootloader-dtb
   7        25396264        25397287   512.0 KiB   0700  bootloader-dtb_b
   8        25397288        25403431   3.0 MiB     0700  secure-os
   9        25403432        25409575   3.0 MiB     0700  secure-os_b
  10        25409576        25413671   2.0 MiB     0700  eks
  11        25413672        25421863   4.0 MiB     0700  adsp-fw
  12        25421864        25430055   4.0 MiB     0700  adsp-fw_b
  13        25430056        25431263   604.0 KiB   0700  bpmp-fw
  14        25431264        25432471   604.0 KiB   0700  bpmp-fw_b
  15        25432472        25434519   1024.0 KiB  0700  bpmp-fw-dtb
  16        25434520        25436567   1024.0 KiB  0700  bpmp-fw-dtb_b
  17        25436568        25440663   2.0 MiB     0700  sce-fw
  18        25440664        25444759   2.0 MiB     0700  sce-fw_b
  19        25444760        25457047   6.0 MiB     0700  sc7
  20        25457048        25469335   6.0 MiB     0700  sc7_b
  21        25469336        25473431   2.0 MiB     0700  fusebypass
  22        25473432        25735575   128.0 MiB   0700  BMP
  23        25735576        25997719   128.0 MiB   0700  BMP_b
  24        25997720        26126743   63.0 MiB    0700  recovery
  25        26126744        26127767   512.0 KiB   0700  recovery-dtb
  26        26127768        26128279   256.0 KiB   0700  kernel-bootctrl
  27        26128280        26128791   256.0 KiB   0700  kernel-bootctrl_b
  28        26128792        26292631   80.0 MiB    0700  kernel
  29        26292632        26456471   80.0 MiB    0700  kernel_b
  30        26456472        26457495   512.0 KiB   0700  kernel-dtb
  31        26457496        26458519   512.0 KiB   0700  kernel-dtb_b
  32        26458520        27072919   300.0 MiB   0700  RECROOTFS
  33        27072920        52449687   12.1 GiB    0700  APP_b
  34        52449688        61071326   4.1 GiB     0700  UDA

Note that the starting sector for APP is 40 rather than 34. It looks like the NVIDIA flashing tools are forcing a sector alignment of 8 sectors (4KiB) for partitions in R32.4.3, which isn't there in R32.5.x.

I'll update tegra-sysinstall-tools to add the UDA partition and to set the partition alignment default to 8.

@madisongh
Copy link
Member

@atharvanan1 I've updated tegra-sysinstall-tools, and have updated my test distros with a recipe for the new version (1.6.0).

@atharvanan1
Copy link
Contributor Author

@madisongh Great! Thanks for your input! I'll try it out on my end, and report back with my findings.

@ichergui Thanks for the offer. I created this PR so that it's easier for Matt to review my testing with OTA flashing update.

@madisongh
Copy link
Member

I was wrong about the root cause being a difference in the L4T tools.... The actual root cause is due to the <align_boundary> element for the APP partition in the XML file, with that partition appearing first in your flash layout but not in the ones I was using (where I reordered the physical layout to move the APP partition to after all the boot-related partitions).

But since APP-first is more typically found, I'll leave things as they are for now, then make sure that when I generate the partition_table file I correctly account for the <align_boundary> setting.

@atharvanan1
Copy link
Contributor Author

Hey @atharvanan1
Thanks for sharing this PR, I guess that you tested this only with Jetson TX2
Do you need support/help to validate this with other Jetson's platform ?

I'm seeing some errors in the last traces ? Do you need help to fix these errors ?

@ichergui Let's keep this PR for TX2 device. With Matt's setup, I'm testing on TX2 c-boot setup, and will be porting to TX2 uboot setup from L4T-28.2. I'll update the status of my testing and add some additional information on what I've tested so far on https://github.com/OE4T/meta-tegra/wiki/Over-the-air-reflashing-process

@ichergui
Copy link
Member

ichergui commented Jun 2, 2021

Hey @atharvanan1
Thanks for sharing this PR, I guess that you tested this only with Jetson TX2
Do you need support/help to validate this with other Jetson's platform ?
I'm seeing some errors in the last traces ? Do you need help to fix these errors ?

@ichergui Let's keep this PR for TX2 device. With Matt's setup, I'm testing on TX2 c-boot setup, and will be porting to TX2 uboot setup from L4T-28.2. I'll update the status of my testing and add some additional information on what I've tested so far on https://github.com/OE4T/meta-tegra/wiki/Over-the-air-reflashing-process

Perfect.

Thanks @atharvanan1

Let me know if you need help

@atharvanan1
Copy link
Contributor Author

@madisongh Thanks for that fix! The update process works with tegra-demo-distro:dunfell-l4t-r32.4.3 branch. I'll update the commit to add any changes that are needed and document it well.

I was wondering if we could remove the dependency on PPTSIZE being 16896. If we could use partition_table file to get PPTSIZE that was used by the build and try to flash according to that. Are there any other dependencies we would want to watch out for?

@atharvanan1 atharvanan1 force-pushed the dunfell-l4t-r32.4.3+ota-components branch from 8720d27 to 85ac2da Compare June 2, 2021 17:58
@madisongh
Copy link
Member

I was wondering if we could remove the dependency on PPTSIZE being 16896. If we could use partition_table file to get PPTSIZE that was used by the build and try to flash according to that. Are there any other dependencies we would want to watch out for?

That should be doable, as long as the actual size is 16896 or greater (if it's less, normal Linux GPT tools won't work). There's already a field in the partition_table file for specifying the starting sector number, and that could be calculated from the info in the XML file. I'll see if I can work that in along with handling the alignment tags.

@atharvanan1
Copy link
Contributor Author

as long as the actual size is 16896 or greater

Sounds good! I was thinking about PPTSIZE set up by image_types_tegra.bbclass when I said that. Let me know once the changes are done, I'll help test that on my end.

@atharvanan1 atharvanan1 changed the title Add OTA reflashing support Add OTA reflashing support on TX2 Jun 2, 2021
@madisongh
Copy link
Member

Let me know once the changes are done, I'll help test that on my end.

OK, the update to nvflashxmlparse is in meta-tegra in all current branches, and I've updated tegra-sysinstall-tools to v1.6.1 to revert the default alignment change which is no longer required. You should be able to drop your image_types_tegra.bbclass replacement.

@atharvanan1 atharvanan1 force-pushed the dunfell-l4t-r32.4.3+ota-components branch from 85ac2da to faec342 Compare June 5, 2021 23:39
@atharvanan1
Copy link
Contributor Author

atharvanan1 commented Jun 5, 2021

Tested OTA update:
custom-sumo-l4t-r28.2 to dunfell-l4t-r32.4.3

Issues:
/data partition set up by previous distro gets mounted while executing tegra-sysinstall. This creates a weird setup where you have a partition mounted, while we are trying to repartition the mmcblk0 device.

EDIT: Nevermind, this was the data.mount service which was mounting the /data partition.

Update process worked perfectly fine!

Additional note: You can also use USB stick to deliver the files if the USB stick has ext4 filesystem.

@atharvanan1 atharvanan1 force-pushed the dunfell-l4t-r32.4.3+ota-components branch 3 times, most recently from 91477de to c8d33cb Compare June 9, 2021 17:03
@atharvanan1
Copy link
Contributor Author

I'll be running mender-torture-tests on jetson-tx2, jetson-xavier-nx-emmc, jetson-nano-emmc devices. I'll keep you posted with the results.

@atharvanan1
Copy link
Contributor Author

atharvanan1 commented Jun 11, 2021

Errors while running on jetson-tx2:

RuntimeError: machine ID changed
 during mender update

While running the script, we get

INFO[0087] Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Leave_80_bl-update
lockfile directory: Read-only file system
Could not locate device info
---------- end of script output 

Where does it come from? Here

This is caused by mounting new-rootfs as read-only. I think I've seen a comment that we need to do this for supporting delta-updates. While, we are trying to update the machine_id here, we probably need to remount the new-rootfs as read-write and change the variable - like so.


Errors while running on jetson-tx2-uboot:

RuntimeError: machine ID changed
 during mender update

Errors while running on jetson-xavier-nx-devkit-emmc:

RuntimeError: machine ID changed
 during mender update

Errors while running on jetson-nano-emmc:

time="2021-06-11T17:30:19Z" level=info msg="Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Leave_80_bl-update\nERR: cannot perform bootloader update\n\n---------- end of script output"
time="2021-06-11T17:30:19Z" level=error msg="ArtifactInstall_Leave script failed: statescript: error executing 'ArtifactInstall_Leave_80_bl-update': 1 : exit status 1"
Rolling back Artifact...
time="2021-06-11T17:30:19Z" level=info msg="Rolling back to the active partition: (1)."
time="2021-06-11T17:30:19Z" level=error msg="statescript: error executing 'ArtifactInstall_Leave_80_bl-update': 1 : exit status 1"

Will try to see what's causing this, and report back

Concluded that the errors were due to me building on a branch that didn't have the latest commit as mentioned here - #113 (comment)

@atharvanan1
Copy link
Contributor Author

From my testing,

  1. Jetson TX2 (cboot) is successfully updated several times
  2. Jetson Xavier NX is successfully updated several times

Apparently, jetson-nano-emmc device fails to update the bootloader. Probably, this applies.

@ichergui
Copy link
Member

From my testing,

  1. Jetson TX2 (cboot) is successfully updated several times
  2. Jetson Xavier NX is successfully updated several times

Apparently, jetson-nano-emmc device fails to update the bootloader. Probably, this applies.

Could you please share the logs ?

@atharvanan1
Copy link
Contributor Author

atharvanan1 commented Jun 14, 2021

Here's the log for
TX2 mender-torture which is currently at 90th iteration or so -
jetson-tx2-mender-torture-90.log

Xavier-nx mender-torture where I tested with 20 mender updates and then a reboot torture -
xavier-mender-torture.log
xavier-nx-reboot-torture.log

I don't know if this isn't sufficient, however I feel that this is good enough to call successful. Let me know if you want me to test more.

Errors with nano:
jetson-nano-mender-torture.log

When I ran the jetson-nano-emmc's ArtifactInstall_Leave_80_bl_update script and did a bash -x, I found that it fails with "grep -r version partitions are corrupted" (currently away from the device, so can't gather more)

EDIT: Here's the bash -x on the artifact install script.
bash-x_on_artifactinstallscript.txt

UPDATE:

Jetson TX2 u-boot logs -
jetson-tx2-mender-torture.log

@atharvanan1
Copy link
Contributor Author

@ichergui @madisongh I have couple of questions and might need help on this.

It seems that how we handle VER partittions for jetson-nano-emmc has changed between gatesgarth (and what the tegra-bootloader-update expects) and dunfell. How do we address this? Should we not include the change from nv_update_engine to tegra-bootloader-update tool in jetson-nano-emmc device?

@atharvanan1
Copy link
Contributor Author

Based on further investigations,

I found that the errors with jetson-nano-emmc are being caused by tegra-bootloader-update thinking something went wrong with VER and VER_b partitions - https://github.com/OE4T/tegra-boot-tools/blob/5a3e698f4f8158a14979df067f29cc26bbd47f75/tegra-bootloader-update.c#L930

Here's a hexdump for VER and VER_b partitions from /dev/mmcblk0boot1 device on an image built off on dunfell-l4t-r32.4.3 tegrademo-mender: (VER partitions are writted with zeros!)

03bb000 6b46 d969 6102 746c 6f62 746f 6d63 3d64
03bb010 7572 206e 656d 646e 7265 615f 746c 6f62
03bb020 746f 6d63 3b64 7220 6e75 6220 6f6f 6374
03bb030 646d 6100 6372 3d68 7261 006d 6162 6475
*
03bcb60 6564 5f76 6f66 5f72 6f62 746f 705f 7261
03bcb70 3b74 6620 0069 6576 646e 726f 6e3d 6976
03bcb80 6964 0061 0000 0000 0000 0000 0000 0000
03bcb90 0000 0000 0000 0000 0000 0000 0000 0000
*
03db000 9153 ce0c 6101 746c 6f62 746f 6d63 3d64
03db010 7572 206e 656d 646e 7265 615f 746c 6f62
03db020 746f 6d63 3b64 7220 6e75 6220 6f6f 6374
03db030 646d 6100 6372 3d68 7261 006d 6162 647
*
03dcb20 6373 6e61 645f 7665 665f 726f 625f 6f6f
03dcb30 5f74 6170 7472 203b 6966 7600 6e65 6f64
03dcb40 3d72 766e 6469 6169 0000 0000 0000 0000
03dcb50 0000 0000 0000 0000 0000 0000 0000 0000
*
0400000

Here's a hexdump from VER and VER_b partitions from /dev/mmcblk0boot1 device on an image built off dunfell-l4t-r32.4.3 branch: (VER partitions are intact)

*
03e0000 564e 0a33 2023 3352 2032 202c 4552 4956
03e0010 4953 4e4f 203a 2e34 0a33 4f42 5241 4944
03e0020 3d44 3433 3834 4220 414f 4452 4b53 3d55
03e0030 3030 3230 4620 4241 343d 3030 320a 3230
03e0040 3031 3136 3137 3030 3539 0a31 5942 4554
03e0050 3a53 3637 4320 4352 3233 323a 3430 3933
03e0060 3531 3536 0a35 0000 0000 0000 0000 0000
03e0070 0000 0000 0000 0000 0000 0000 0000 0000
*
03f0000 564e 0a33 2023 3352 2032 202c 4552 4956
03f0010 4953 4e4f 203a 2e34 0a33 4f42 5241 4944
03f0020 3d44 3433 3834 4220 414f 4452 4b53 3d55
03f0030 3030 3230 4620 4241 343d 3030 320a 3230
03f0040 3031 3136 3137 3030 3539 0a31 5942 4554
03f0050 3a53 3637 4320 4352 3233 323a 3430 3933
03f0060 3531 3536 0a35 0000 0000 0000 0000 0000
03f0070 0000 0000 0000 0000 0000 0000 0000 0000
*
0400000

Based on Mender Tegra Partition Layout, we see that the UBENV is located around 0x3bb000 on /dev/mmcblk0boot1 device.

fw_env.config

/dev/mmcblk0boot1 0x3bb000 0x20000
/dev/mmcblk0boot1 0x3db000 0x20000

0x3bb000 + 0x20000 = 0x3db000
0x3db000 + 0x20000 = 0x3fb000

Since, VER_b is located at 0x3e0000 and VER is located at 0x3f0000. They will get overridden.

It looks like we have already handled this here - OE4T/meta-mender-community#5, which exists in meta-mender-community dunfell branch - https://github.com/mendersoftware/meta-mender-community/blob/dunfell/meta-mender-tegra/recipes-bsp/u-boot/patches/0013-Reduce-env-size-on-p3450-0000-to-64KiB.patch (Only patches SPI/SD machines)

Looks like the nv_update_engine doesn't care about VER partitions being wiped - because mender updates work fine!


I think we need to apply above-mentioned patch on p3450_0002_defconfig as well. I added that just now. I tested and the mender update works fine!

Here's the layout after the fix:

$ hexdump mmcblk0boot1_fixed_mender -s 3911680 -n 64
03bb000 a8f7 5e3a 6102 746c 6f62 746f 6d63 3d64
03bb010 7572 206e 656d 646e 7265 615f 746c 6f62
03bb020 746f 6d63 3b64 7220 6e75 6220 6f6f 6374
03bb030 646d 6100 6372 3d68 7261 006d 6162 6475
03bb040
$ hexdump mmcblk0boot1_fixed_mender -s 3977216 -n 64
03cb000 7a4a 2852 6101 746c 6f62 746f 6d63 3d64
03cb010 7572 206e 656d 646e 7265 615f 746c 6f62
03cb020 746f 6d63 3b64 7220 6e75 6220 6f6f 6374
03cb030 646d 6100 6372 3d68 7261 006d 6162 6475
03cb040
$ hexdump mmcblk0boot1_fixed_mender -s 4063232 -n 64
03e0000 564e 0a33 2023 3352 2032 202c 4552 4956
03e0010 4953 4e4f 203a 2e34 0a33 4f42 5241 4944
03e0020 3d44 3433 3834 4220 414f 4452 4b53 3d55
03e0030 3030 3230 4620 4241 343d 3030 320a 3230
03e0040
$ hexdump mmcblk0boot1_fixed_mender -s 4128768 -n 64
03f0000 564e 0a33 2023 3352 2032 202c 4552 4956
03f0010 4953 4e4f 203a 2e34 0a33 4f42 5241 4944
03f0020 3d44 3433 3834 4220 414f 4452 4b53 3d55
03f0030 3030 3230 4620 4241 343d 3030 320a 3230
03f0040

@madisongh
Copy link
Member

Looks like the nv_update_engine doesn't care about VER partitions being wiped - because mender updates work fine!

nv_update_engine isn't used on the t210 machines (TX1 and Nano). There's a Python script that NVIDIA provides for bootloader updates on those platforms. The logic I have in tegra-boot-tools is more or less the same as what's implemented in that script.

The reason Mender updates work fine with TX1/Nano is because Mender updates don't do bootloader updates on the t120 platforms. So you'd never notice this.

For an OTA reflash, we should be erasing the boot device if it needs to be reformatted - which it would if you've got the Uboot-env-overwriting-VER problem. I added some improvements to tegra-boot-tools and the tegra-sysinstall scripts to better handle the needs-erasing case, but they'll also need an updated nvflashxmlparse script as well. I've got the t-b-t and script updates in master currently, and the updated tegra-sysinstall in the master branch of my test distro.

@atharvanan1
Copy link
Contributor Author

NOTE: this change doesn't work when you move from dunfell-l4t-r32.4.3 to anything built from this branch on jetson-nano-emmc device.

It breaks because VER partitions are overriden and tegra-bootloader-update complains about it.

My solution was to pack the VER partition contents into the new image and write them. However, @dwalkes mentioned UBENV location being changed - which might break things. @dwalkes do you want to jump in and give your input.

@atharvanan1
Copy link
Contributor Author

I have a change in place to remove the partition layout change, and updates to the script to add verbose messaging in a local branch. I'll run some tests and update you on that!

@atharvanan1
Copy link
Contributor Author

Log showing the warning for jetson-nano-emmc device:

INFO[0000] Loaded configuration file: /var/lib/mender/mender.conf 
INFO[0000] Loaded configuration file: /etc/mender/mender.conf 
INFO[0000] Mender running on partition: /dev/mmcblk0p1  
INFO[0000] Performing remote update from: [http://10.1.10.120:8000/demo-image-base-jetson-nano-emmc.mender]. 
Installing Artifact of size 196520960...
INFO[0000] No public key was provided for authenticating the artifact 
INFO[0000] Update Module path "/usr/share/mender/modules/v3" could not be opened (open /usr/share/mender/modules/v3: no such file or directory). Update modules will not be available 
INFO[0000] Opening device "/dev/mmcblk0p18" for writing 
INFO[0000] Native sector size of block device /dev/mmcblk0p18 is 512 bytes. Mender will write in chunks of 1048576 bytes 
.............................................................. - 100 %
INFO[0036] All bytes were successfully written to the new partition 
INFO[0036] The optimized block-device writer wrote a total of 4743 frames, where 0 frames did need to be rewritten (i.e., skipped) 
INFO[0037] Wrote 4972347392/4972347392 bytes to the inactive partition 
INFO[0037] Enabling partition with new image installed to be a boot candidate: 18 
INFO[0038] Executing script: ArtifactInstall_Leave_80_bl-update 
INFO[0038] Collected output (stderr) while running script /var/lib/mender/scripts/ArtifactInstall_Leave_80_bl-update
WARN: VER partitions are corrupted
WARN: Please refer to https://github.com/OE4T/tegra-demo-distro/pull/113 for additional context
WARN: Attempting update anyway

---------- end of script output 
Use -commit to update, or -rollback to roll back the update.
At least one payload requested a reboot of the device it updated.

Will run some mender torture on this device. Pushing cleaned commits.

@atharvanan1 atharvanan1 force-pushed the dunfell-l4t-r32.4.3+ota-components branch 2 times, most recently from cac6ed4 to c9e46b3 Compare June 18, 2021 22:42
@atharvanan1
Copy link
Contributor Author

For an OTA reflash, we should be erasing the boot device if it needs to be reformatted - which it would if you've got the Uboot-env-overwriting-VER problem. I added some improvements to tegra-boot-tools and the tegra-sysinstall scripts to better handle the needs-erasing case, but they'll also need an updated nvflashxmlparse script as well. I've got the t-b-t and script updates in master currently, and the updated tegra-sysinstall in the master branch of my test distro.

I have something in works to add these changes into dunfell-l4t-r32.4.3 branch from meta-tegra here


Tested mender updates on jetson-nano-emmc it works fine. Couldn't do full mender-torture, as it didn't work for some reason. What did I check?

  • Mender update changes the rootfsA to rootfsB.
  • Rebooting without mender -commit boots back to rootfsA. It stays like that for at least 3 reboots (that's what I tested)
  • Rebooting with mender -commit boots into rootfsB. It stays like that for at least 3 reboots

@dwalkes
Copy link
Member

dwalkes commented Jun 19, 2021

Couldn't do full mender-torture, as it didn't work for some reason.

I don't think we support mender-torture on u-boot environments yet, see https://github.com/mendersoftware/meta-mender-community/tree/dunfell/meta-mender-tegra/scripts/test#tegra-mender-torture-tests

atharvanan1 and others added 14 commits July 10, 2021 18:16
- These changes add OTA reflashing support as per https://github.com/OE4T/meta-tegra/wiki/Over-the-air-reflashing-process

Signed-off-by: Atharva Nandanwar <[email protected]>
to include the full_init_payload for OTA reflashing
purposes.

Signed-off-by: Matt Madison <[email protected]>
This is done to match Matt's setup here
madisongh/tegra-test-distro@ebec504

Signed-off-by: Atharva Nandanwar <[email protected]>
…tools

replacing the NVIDIA bootloader update tools.

Note that for cboot platforms, the InstallEnter script may still need
to use nvbootctrl for getting the current boot slot, since the script
would run in the context of an older rootfs that only has the NVIDIA
tools installed.

Signed-off-by: Matt Madison <[email protected]>
to provide a modified platform-preboot script for cboot platforms
that performs a bootcount check.

Signed-off-by: Matt Madison <[email protected]>
* tegra-boot-tools-nvbootctrl added to provide the nvbootctrl compatibility
  script for handling downgrades to images that assume that the NVIDIA
  bootloader tools are in use.

* tegra-boot-tools-lateboot added for marking boots successful.

Signed-off-by: Matt Madison <[email protected]>
which uses tegra-boot-tools for getting the current
boot slot.

Signed-off-by: Matt Madison <[email protected]>
to install an extra copy of the fw_env.config file so we
can detect an environment location change during an udpate.

Signed-off-by: Matt Madison <[email protected]>
…boot

to handle U-Boot environment location changes and allow for automatic
bootloader updates for Nano-eMMC targets from older versions where
the U-Boot environment clashed with the VER partitions.

Signed-off-by: Matt Madison <[email protected]>
to mount the new rootfs read-only, as is now done with
the scripts in meta-mender-tegra.

Signed-off-by: Matt Madison <[email protected]>
for cboot platforms, to make /run available in the chroot.

Signed-off-by: Matt Madison <[email protected]>
- Add verbose messaging about VER partitions being corrupted
- Continues the update without exiting, added a message about that

Signed-off-by: Atharva Nandanwar <[email protected]>
@atharvanan1 atharvanan1 force-pushed the dunfell-l4t-r32.4.3+ota-components branch from c9e46b3 to e46fe08 Compare July 11, 2021 00:18
@atharvanan1
Copy link
Contributor Author

Tested this today with latest tegra-boot-tools changes. Updated the README with a notice for change in boot-tools.

@madisongh
Copy link
Member

@atharvanan1 LGTM now.... is there anything else that needs to be checked before I merge?

@atharvanan1
Copy link
Contributor Author

Let's sit on it for a day, I'll think if I'm missing anything.

@atharvanan1
Copy link
Contributor Author

Here's an overview of what's changed and tested:

  1. tegra-redundant-tools are removed and replaced with tegra-boot-tools-updater. These are necessary for OTA reflashing update, given that full_init_bl_payload is present which is added by this commit. Updated the README for that.
  2. Mender scripts have been changed for all devices to use tegra-boot-tools-updater. For this, mender torture has been tested on jetson-tx2 and jetson-xavier-nx-devkit-emmc devices. For jetson-nano-emmc device, we give an verbose message if VER partitions are corrupted - but continue with the update anyway. Tested on jetson-nano-emmc as per this comment.
  3. Reflashing process has been tested on Jetson TX2 as per https://github.com/OE4T/meta-tegra/wiki/Over-the-air-reflashing-process.

Let me know if I should test any other devices (SD Card based devices), however as I understand they should work as their emmc counterparts work.

@madisongh madisongh merged commit b1367b4 into OE4T:dunfell-l4t-r32.4.3 Jul 12, 2021
@madisongh
Copy link
Member

Thanks for all the work on this, @atharvanan1 !

@atharvanan1
Copy link
Contributor Author

Thanks for helping me on this! @madisongh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants