Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

efibootmgr exits with an error code 13 #968

Open
markohrastovec opened this issue Sep 20, 2022 · 8 comments
Open

efibootmgr exits with an error code 13 #968

markohrastovec opened this issue Sep 20, 2022 · 8 comments
Labels
bug Something isn't working

Comments

@markohrastovec
Copy link

markohrastovec commented Sep 20, 2022

Actual behavior
I had problems because upgrade procedure exited with an error reporting efibootmgr error code 13 or 14 (i am not sure). Unforutnately, I do not have log files, because I struggled to make it work and eventually overwritten the logs with the successful run.

To Reproduce
I cannot reproduce the error, because I finally succeeded, and I do not have a computer to reproduce it. However, I can describe the behaviour in detail.

The upgrade procedure exited with error reporting that efibootmgr exited with error using parameter bootnext '-n 000F'. I checked the EFI entries and 000F was the currently active Oracle Linux. There were 15 EFI entries. 000E was not present (missing), but 000F was.

Then I entered BIOS and deleted all EFI entries. 14 were added automatically, and I added the one to boot Oracle Linux. All entries were numbered sequentially 0000..000D. After that the upgrade went through, and did not complain running efibootmgr any more.

Expected behavior
I did not expect upgrade to do anything with EFI. Adding a GRUB entry would be sufficient in my opinion.

System information (please complete the following information):

  • OS and version: updated Oracle Linux 7.9 upgrade Oracle Linux to 8.x
  • upgrading to UEK kernel
@markohrastovec markohrastovec added the bug Something isn't working label Sep 20, 2022
@pirat89
Copy link
Member

pirat89 commented Sep 21, 2022

@markohrastovec Thank you for the report. Could you provide /var/lib/leapp/leapp.db file? As the file could be pretty large and possibly you would like to not share all data inside, you can use leapp-inspector to provide some additional data we need (if you have still the leapp.db file):

# leapp-inspector messages --type StorageInfo --recursive-expand
# leapp-inspector actors --actor efi_interim_fix
# leapp-inspector actors --actor efi_finalization_fix

I did not expect upgrade to do anything with EFI. Adding a GRUB entry would be sufficient in my opinion.

Actually not, we use this command to prevent some known issues.
The command is only setting the efiboot entry for the next boot. Some systems set a different efiboot entry as a default and expect to set particular efiboot entry prior the reboot. The only thing we do here is that we set the same efiboot entry what we discover the system is currently booted on (and we do not expect to change this), so the command should work correctly always and there is no expectation of problems. The exception is when there is something really wrong - and till this day all cases I investigated in relation to fail of this command have been connected to invalid or unsupported bootloader/system configuration. Not saying this is your case (i do not remamber a system with so many efiboot entries), just sharing this experience in case it could help, as regarding my experience with similar issue, the chance is high. Majority of systems with broken configuration have been installed from images that already have been broken but somehow magically the system is able to boot still.

@markohrastovec
Copy link
Author

markohrastovec commented Sep 21, 2022

Here are the outputs of the commands. I guess these are from the last attempt, when the upgrade finally went through.
messages.txt
actors_efi_interim.txt
actors_efi_finalization.txt

@pirat89
Copy link
Member

pirat89 commented Sep 21, 2022

@markohrastovec Thank you. Based on that, I have one another idea about the possible root cause, but I will need more data to be sure. Could you produce the same set of data for previous executions of leapp (they are also part of the leapp.db file still)? I need to see this triplet for an execution when the error occured. To do this, you can use the --context <execution-id> parameter (it's possibly possitional, must be specified before the leapp sub-cmd). To simplify it, you can just share output of this cmd:

for context in $(leapp-inspector executions | grep "^[a-f0-9]" | cut -d " " -f1); do
  echo "##################################################################"
  echo "# CONTEXT: $context"
  echo "##################################################################"
  leapp-inspector --context "$context" messages --type StorageInfo --recursive-expand
  leapp-inspector --context "$context" actors --actor efi_interim_fix
  leapp-inspector --context "$context" actors --actor efi_finalization_fix
done

@markohrastovec
Copy link
Author

Here is the output of the proposed for loop. There are a lot of executions in there, because I had other issues before the one with EFI, and I also had problems with connectivity, and upgrade was canceled sometimes because it could not get some data over the network. Nevertheless, there are also failed and successful calls to efibootmgr.
executions.txt

@pirat89
Copy link
Member

pirat89 commented Sep 22, 2022

@markohrastovec hi, so the issue is a little bit different than I was thinking about:

Could not set BootNext: No space left on device"

Which could be more things regarding uncle google

  • not enough space on /boot/efi
  • a write protection (e.g. locked boot order in bios)
  • ....

But in this case I think it was probably really not enough space regarding the number of efiboot entries and that reduing the amount of entries actually fixed the problem.

I have no idea how this could be eventually fixed in leapp-repository. Some ideas:

  1. Introduce e.g. LEAPP_IGNORE_EFI=1 which will keep the booting on users
  2. Check amount of free space on /boot/efi (considering this as unreliable solution).
  3. Improve the error msg so people could fix their system before the upgrade.

From my POV, the last one is the best way. Possibly combined with (1)

@markohrastovec
Copy link
Author

@pirat89 hi,

/boot/efi is a 2% occupied 500M partition. Maybe no space left is referring to the boot order, with too many entries. So, I guess, not enough space on /boot/efi was not an issue. During upgrade attempts I once had a no space on /boot partition error, but that is completely different.

While searching for a solution, I deleted most of the entries from EFI boot leaving 000F untouched, and they were added as 0010, 0011, 0012,... I did not try to upgrade in that state. At the end I deleted all, after I found out that I can add the correct entry manually anytime. Then, they were added 0000..000D, and 000E entered manually, as seen in the report.

I do not undestand the message "No space left on device", because nothing was actually added. Just 000F should be set as the first boot option, and it already was. I would not rule out buggy BIOS here causing this error.

I agree with your proposal to give a more descriptive error message and to povide an option to avoid this efibootmgr command. On our computer, I do not believe that command actually produced anything else but an error. Boot order should have remained unchanged.

@pirat89
Copy link
Member

pirat89 commented Sep 22, 2022

@markohrastovec I wanted to rule out buggy bios as most references for similar issue comes from ~2014, but it could be. Also it's possible to block efibootmgr to manage the boot order from bios, where it's possible to lock the boot order - which results in the same error msg in efibootmgr about the space. So let's stick with the solution 1 & 3.

@bbbjames
Copy link

hello, thank you both for posting this, after running the preupgrade reports and solving issues one by one, i got it good but failed on the actual upgrade ::sadface::

everything was completed from looking at the console, but i guess, the error on the boot changes, is that the final step of the script?

i will assume it was disk space then, but i'd like to note migrating to ol7 from centos7 with centos2ol sh had a similar issue with boot changes not happening, and which i had to do manually so i would suggest a permissions issue maybe? oracle/centos2ol#70 (comment)

anyway, i've decided to go down the route of just starting from scratch with ol8, here is the report error, thank you again - the free disk space on boot was ~400MB, if it needed more 🤷

Risk Factor: high (error)
Title: Actor efi_interim_fix unexpectedly terminated with exit code: 1
Summary: Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/leapp/repository/actor_definition.py", line 74, in _do_run
    actor_instance.run(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/leapp/actors/__init__.py", line 289, in run
    self.process(*args)
  File "/usr/share/leapp-repository/repositories/system_upgrade/common/actors/efibootorderfix/interim/actor.py", line 17, in process
    efi_reboot_fix.maybe_emit_updated_boot_entry()
  File "/usr/share/leapp-repository/repositories/system_upgrade/common/libraries/efi_reboot_fix.py", line 47, in maybe_emit_updated_boot_entry
    run(['/sbin/efibootmgr', '-n', current_boot])
  File "/usr/lib/python2.7/site-packages/leapp/libraries/stdlib/__init__.py", line 192, in run
    result=result
CalledProcessError: Command ['/sbin/efibootmgr', '-n', u'0001'] failed with exit code 13.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants