[go: nahoru, domu]

Page MenuHomePhabricator

Netbox: fill network topology
Closed, ResolvedPublic

Description

As part of the parent goal, track additional categories of infrastructure topology information (e.g. VLANs, IP space, network circuits, etc.)

This table is to track the information documented in Netbox.
Note that this Q goal is not to have it fully completed as some data don't have value in Netbox yet and would increase the workload of some teams (eg. DCops).

Dataulsfoeqiadcodfwesamseqsinnotes
Public v4 subnets
Public v6 subnets
Private subnets
Vlans
Circuits
Network devices IPs
Network InventoryT221506
Management/OOBExplicitly ignoring in core DCs
Virtual Chassis
VC Links
Console links
Core linksLinks between network devices. Cable IDs in interface descriptions
Power linksNeed manual work from DCops
Servers interfacesT244153
Servers linksT262899
Server IPsT244153
Virtualization clustersT215229 T239123
VMsT215229 T239123
VM interfacesT244153
VM IPsT244153

Event Timeline

Volans triaged this task as Medium priority.Oct 1 2018, 5:02 PM
Volans created this task.
Volans removed Volans as the assignee of this task.Oct 1 2018, 5:05 PM

@ayounsi do you want to keep this open for tracking things to integrate further although the goal part is done?

ayounsi closed this task as Resolved.EditedDec 17 2018, 4:17 PM

It's fine to close it and open a new one when we know how to proceed to import new data.

This task is great, and the table at the top is a very useful summary! The Q2 goal part of it has been completed indeed, so I can see the argument for the task being resolved.

However, we have plans to do more of this kind of work in Q3, and I think it would useful to not have to copy this table in another task but rather keep updating it -- so I'm going to be bold and reopen this :)

Netbox is now at 2.5 \o/ which allows us to import cable IDs, type, color etc. Let's start with importing eqsin's, with the data that we have in the spreadsheet, so that we can deprecate that? @RobH @ayounsi any takers?

I looked into that and imported all the cables related to scs-eqsin to test the new feature:

  • The cable list doesn't allow to filter/sort by endpoint. Which makes mass editing a pain. As we can't easily do "all cables connected to ps1 are black and power". Should be reported upstream.
  • Adding cable ID via the UI requires a lot of clicks.
  • "Power" is the only option, in the dropdown, we can't specify the connector type. Upstream task.

One option would be to mass export the list, manually edit the csv and re-import it, but it would need to be tested first to not risk losing the data.

@ayounsi you can test it on the WMCS instance that @crusnov has created to test the upgrade ;) (do not add sensitive data there)

I started to format the spreadsheat in a way that can be imported in Netbox:
https://docs.google.com/spreadsheets/d/1FKYVQJePjTQ7nVwYv4oDC6Gszk7RLrkq5ySN0fjvSoY/edit#gid=1665726692
The plan is to then delete the existing cables and import them again with all their attributes.

Some notes/questions:

  • How should we name server interfaces? The physical Port 1, Port 2, etc. or the Linux naming (enp5s0f0, enp5s0f1, etc)

My vote so far would go with #2. Even if it's harder to parse for a human, it should stay consistent.

  • How should we model patch panels? Especially for PP with 2 SC ports that merge into 1 LC connector on a router?

I did the following as example: https://af-netbox.wmflabs.org/dcim/devices/2067/
Created a "Generic" vendor, "1U patch panel" device type, Cable management" role.
Name is "PP:0603:1087235" which is Equinix's naming of the patch panel.
I then used the "Front Ports"/"Rear Ports" backwards and created a single rear port named "13/14" SC type, that I connected to a device's interface (https://af-netbox.wmflabs.org/dcim/cables/270/)

Upstream issue is https://github.com/digitalocean/netbox/issues/2633
It's not a clean way of doing it, but I don't think it's an issue (and better than a spreadsheet), especially if we limit it to Singapore.

See https://netbox.readthedocs.io/en/stable/core-functionality/devices/#devices for the doc.

Edit: Side note, we should also document and enforce the different component names, otherwise we will end up with "PSU1" "PSU 1" "psu1" etc.

The plan is to then delete the existing cables and import them again with all their attributes.

Sounds good to me this one-time import.

Some notes/questions:

  • How should we name server interfaces? The physical Port 1, Port 2, etc. or the Linux naming (enp5s0f0, enp5s0f1, etc)

My vote so far would go with #2. Even if it's harder to parse for a human, it should stay consistent.

My vote is for whatever is clearer/simpler for the people that has to physically interact with them as long as they uniquely identify the parts without ambiguity.

Edit: Side note, we should also document and enforce the different component names, otherwise we will end up with "PSU1" "PSU 1" "psu1" etc.

This can be easily added to one of the reports, probably the coherence one. (cc @crusnov )

  • How should we name server interfaces? The physical Port 1, Port 2, etc. or the Linux naming (enp5s0f0, enp5s0f1, etc)

My vote so far would go with #2. Even if it's harder to parse for a human, it should stay consistent.

My vote is for whatever is clearer/simpler for the people that has to physically interact with them as long as they uniquely identify the parts without ambiguity.

Well, it has to be clean on both sides. The whole point of the newer naming schemes (e.g. enp4s0f0p1) was to match up physical and logical naming. Probably we should use these names because that's what they're for, but I think separately we should look hard at whether that's actually working out on the hardware side of things (because I don't think in at least some cases it's obvious to dcops where enp4s0f0p1 is), and maybe fix that using some updated udev rules or whatever that work for the hardware we've got and actually make sense in the physical world.

The medium-term plan is for this data to be entered into Netbox after a server is racked but before it's provisioned or even powered up, and that data to be used by our tooling to configure and execute the provisioning itself (DHCP configuration, switchport, OS install etc.).

So, I don't think we can reasonably expect our on-site techs to look at a box and say "oh this port is enp4s0f0p1" and record it as such :)

More broadly, I think this ties back to a larger discussion we've had with regards to provisioning: whether we'd manually (or semi-manually, e.g. barcodes) enter MAC addresses into Netbox, or whether we'd rely on something like the serial number or asset tag, and inventorize at some intermediate step automatically.

Both approaches have their pros and cons, and I don't think we've made any decisions yet. Depending on how that goes, we could either generate names of our own using a custom MAC-address-based udev rule, or import systemd/udev consistent names during the inventorization step.

I think all that is a bit farther out, so in the meantime and for the purposes of this task I'd record something high-level for now like "port 1" or something like that.

The medium-term plan is for this data to be entered into Netbox after a server is racked but before it's provisioned or even powered up, and that data to be used by our tooling to configure and execute the provisioning itself (DHCP configuration, switchport, OS install etc.).

So, I don't think we can reasonably expect our on-site techs to look at a box and say "oh this port is enp4s0f0p1" and record it as such :)

Ah I didn't understand all the context here. Is it possible we can configure the software interface names to match during provisioning maybe? I'm just wondering how we get to an end state where there's no confusion between software and physical ports.

The medium-term plan is for this data to be entered into Netbox after a server is racked but before it's provisioned or even powered up, and that data to be used by our tooling to configure and execute the provisioning itself (DHCP configuration, switchport, OS install etc.).

So, I don't think we can reasonably expect our on-site techs to look at a box and say "oh this port is enp4s0f0p1" and record it as such :)

Ah I didn't understand all the context here. Is it possible we can configure the software interface names to match during provisioning maybe? I'm just wondering how we get to an end state where there's no confusion between software and physical ports.

Fairly certain that udev lets us rename devices so that based on the mac we can set the device name, I'm not sure though how that interacts with d-i or whatever though.

So, I don't think we can reasonably expect our on-site techs to look at a box and say "oh this port is enp4s0f0p1" and record it as such :)

Indeed, some parts of the interface names can be found, but others such as PCI bus number cannot.

Both approaches have their pros and cons, and I don't think we've made any decisions yet. Depending on how that goes, we could either generate names of our own using a custom MAC-address-based udev rule, or import systemd/udev consistent names during the inventorization step.

Not using the standard interfaces naming (with custom rules, etc) on the machines brings the risk of making our infrastructure more of a snowflake and triggering bugs/limitations down the road.
Not saying we shouldn't do it, but this should weight in the decision.

Another option could be to rename the Netbox interface during the provisioning process.
Eg.:
DCops sets PCI port 1, MAC xx:xx, connected to switch1:xe-1/0/1.
And after the machine is ready, provisioning verifies with LLDP that the link is correct and renames the high level PCI port 1 to the exact machine's interface name.

At the end of the day, I do think we should have the machine's interface name (renamed from standard or not) in Netbox, and not a higher level label.

I think all that is a bit farther out, so in the meantime and for the purposes of this task I'd record something high-level for now like "port 1" or something like that.

As the servers are running, I'd suggest we use enp5s0f0, and rename them in Netbox if we decide to change the machine's interfaces names.

EDIT: Also looking for feedback on the patch panel point :)

I imported all the cables except the servers' uplinks, see:
https://netbox.wikimedia.org/dcim/cables/?page=6 (and previous pages)

I ended up using the circuits feature of Netbox instead of the previously mentioned hack to track patch panel ports, as it's possible to define a A/Z side patch panel and ports. Note that this is an free text field so no verification is done.

I've had a chat with @ayounsi about this.

While I agree what all that @faidon said in T205897#4953769, I actually came to a different conclusion.
The problem is that port 1 doesn't mean anything in a reliable way both on the physical and OS side of the problem. To make a name/description that identify an interface on the physical side we should coin some sort of vocabulary that maps reality with a name, but that should take into account all the various corner cases, different physical formats, etc. and most likely will end up not covering all cases we need.

So my suggestion would be to use the predictive interface name from the OS to backfill existing interfaces. For the longer term we can probably just put a dummy name when racking the host and inserting it into netbox to allow to insert the cabling and then, at provision time, we could query netbox for all the interfaces and their cabling and based on that be able to map them to the OS-ones and update their names.

Using the OS names should also simplify any future migration we'd want to do from this naming scheme to a new one, in case the time comes.

All eqsin links are now in Netbox.

faidon updated the task description. (Show Details)
ayounsi claimed this task.

Finally time to close this task.

We've added more things to Netbox since, but no need for a tracking task anymore.

Tracking core sites power cables can't be automated, and it's significant DCops work to manually create them for not enough benefits as we have permanent onsite staff and servers often get added/removed.