Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NICs in wrong order unless memory banks meet specific criteria #534

Open
brandt opened this issue Mar 23, 2017 · 7 comments
Open

NICs in wrong order unless memory banks meet specific criteria #534

brandt opened this issue Mar 23, 2017 · 7 comments
Labels

Comments

@brandt
Copy link

brandt commented Mar 23, 2017

Here's a weird bug.

When processing uploaded lshw data, Collins gets the NICs backwards unless there are more than 5 memory banks, of which the last three must be occupied.

Reproducing

I've included some usable example lshw files to reproduce the behavior. (They're missing a lot of details, but Collins will accept them.)

I used the tumblr/collins:latest Docker image.

To verify order, go to the #hardware-details tab of the asset page. The correct order should be:

  1. 52:54:00:00:00:01
  2. 52:54:00:00:00:02

Example Failure

If a host (VM for example) has only one memory bank, its NICs will be listed in reverse order in Collins:

<!-- NICs in reverse order -->
<node class="system">
  <node id="core">
    <node class="processor" handle="DMI:1400" id="cpu">
      <product>Foo</product>
    </node>
    <node class="bridge">
      <node class="network" id="network:0">
        <serial>52:54:00:00:00:01</serial>
        <capacity>1</capacity>
      </node>
      <node class="network" id="network:1">
        <serial>52:54:00:00:00:02</serial>
        <capacity>1</capacity>
      </node>
    </node>
    <node class="memory" id="memory">
      <node class="memory" id="bank">
        <size>1</size>
      </node>
    </node>
  </node>
</node>

Example Success

However, if there are at least 5 memory banks (of which the last 3 must be occupied), it will be in the correct order:

<!-- NICs in correct order -->
<node class="system">
  <node id="core">
    <node class="processor" handle="DMI:1400" id="cpu">
      <product>Foo</product>
    </node>
    <node class="bridge">
      <node class="network" id="network:0">
        <serial>52:54:00:00:00:01</serial>
        <capacity>1</capacity>
      </node>
      <node class="network" id="network:1">
        <serial>52:54:00:00:00:02</serial>
        <capacity>1</capacity>
      </node>
    </node>
    <node class="memory" id="memory">
      <node class="memory" id="bank:0">
        <size>0</size>
      </node>
      <node class="memory" id="bank:1">
        <size>0</size>
      </node>
      <node class="memory" id="bank:2">
        <size>1</size>
      </node>
      <node class="memory" id="bank:3">
        <size>1</size>
      </node>
      <node class="memory" id="bank:4">
        <size>1</size>
      </node>
    </node>
  </node>
</node>

With any fewer banks, the NICs are in the wrong order. If any of the last 3 banks are empty, the NICs will also be in the wrong order.

@michaeljs1990
Copy link
Contributor

I tested this out locally with the data you provided above and everything is returned in the correct order. I need to test with some real LSHW outputs though that feature both cases to make sure nothing else strange changes.

@michaeljs1990
Copy link
Contributor

Out of curiosity was this causing any issues or was it simply a UI bug for you?

@brandt
Copy link
Author

brandt commented Mar 27, 2017

Our provisioning system assumes that the order in Collins corresponds with the order of the NICs as they appear on the running machine. It uses that info to create an ephemeral system entry in Cobbler.

It's always worked properly on metal, but when we dual-homed a VM it would configure the NICs backwards.

@michaeljs1990
Copy link
Contributor

Well if you are feeling adventurous you can patch in the PR above it doesn't modify how data is saved only how it is returned after being fetched from the DB so no risk of messing anything up permanently.

@brandt
Copy link
Author

brandt commented Mar 27, 2017

Awesome, thank you!

@byxorna
Copy link
Contributor

byxorna commented Mar 28, 2017

@brandt id be careful with that... Relying on the ordering in collins can cause biosdevname ordering issues, if for example your NIC has 2 ports, but the 2 mac addresses enumerate backwards. I.e. interface index in kernel => mac: 0 => AA:BB:CC:DD:EE:F1, 1 => AA:BB:CC:DD:EE:F0. Some tooling naievely sorts interfaces by mac, and others use ifindex like the kernel does

@brandt
Copy link
Author

brandt commented Mar 28, 2017

@byxorna yeah, the assumptions about NIC order have been fragile.

Collins is our "source of truth," so the design goal behind this was to eliminate an unnecessary duplication of information. We'll be able to get by with this fix for now, but I have some ideas on how to do it more safely using the LLDP data.

@byxorna byxorna added the bug label May 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants