Skip to content
This repository has been archived by the owner on Oct 27, 2022. It is now read-only.

script and udev rule for deterministic ephemeral devices #9

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

missingcharacter
Copy link

@missingcharacter missingcharacter commented Apr 22, 2020

This was tested on systemd-udev

https://www.freedesktop.org/software/systemd/man/udev.html

Worked on this with @thecubed

@oogali
Copy link
Owner

oogali commented Sep 3, 2020

I'm not intimately familiar with the sequence of operations, but isn't there a potential for a conflict of two instances of the next ephemeral script are executed simultaneously?

i.e. two EBS volumes are attached, udevd executes next ephemeral, and there's two instances trying to grab /dev/ephemeral1...

You could use a lock file to protect against this but then you end up in a scenario where one invocation can run and succeed, but the other one fails. How would udevd handle that failure (or if not handled, how is that failure communicated)?

@missingcharacter
Copy link
Author

missingcharacter commented Sep 3, 2020

Hi @oogali, thanks for getting back to me.

but isn't there a potential for a conflict of two instances of the next ephemeral script are executed simultaneously?

Yes, you are right this could happen. I think the code below could solve this concern:

#!/usr/bin/env bash
# To be used with the udev rule: /etc/udev/rules.d/999-aws-ebs-nvme.rules

# check if lock file exists
script_name="$(basename $0)"
pid_file="/tmp/${script_name}.lock"
counter=0
until [ $counter -eq 5 ] || [[ ! -e "${pid_file}" ]] ; do
  sleep $(( counter++ ))
done

# create lock file if it does not exist
if [[ -e "${pid_file}" ]]; then
  echo "Lock file ${pid_file} still exists after counter ended" >&2 
  exit 1
else
  touch "${pid_file}"
fi

kern_name=${1}
incr=0
while [[ -e "/dev/ephemeral${incr}" ]] && [[ $(readlink "/dev/ephemeral${incr}") != "${kern_name}" ]]; do
  incr=$[$i+1]
done
# remove lock file
rm "${pid_file}"
echo "ephemeral${incr}"

How would udevd handle that failure (or if not handled, how is that failure communicated)?

The failure will be communicated with echo "Lock file ${pid_file} still exists after counter ended" >&2 and exit 1 and it should look like this in systemd logs

$ journalctl -u systemd-udevd
Sep 3 21:57:43 test-vm systemd-udevd[2998]: failed to execute '/usr/local/bin/nextephemeraldevice.sh nvme1n1': Lock file /tmp/nextephemeraldevice.sh.lock still exists after counter ended

Also, the default timeout is 180 seconds according to freedesktop

One other thing next ephemeral should only run for EC2 Instance Store

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants