Monday afternoon: One of our IT support techs reports an user with a D-Link switch, aka Etherkiller Jr. I contact the user and install a wall-mounted PoE switch after she leaves for the day.
Tuesday morning: I arrive to find "My printer isn't working " in my inbox. Ok, so I didn't print a test page after the install. It's just a switch, and the printer had a link light.
I trudge up to her office. IT support has beaten me there, and they've got ping output from the user's desktop to the printer showing packet loss, and "it's your baby" looks on their faces. Whatever. I do a sanity check ping to 188.8.131.52 and see packet loss there too. Packet loss on desktop-->printer and packet loss on desktop-->Internet suggests the desktop<>switch link is the culprit. Way to troubleshoot, guys. Bad ports happen, but not often enough to verify every port. I swap out the switch port for the desktop and confidently tell them to print something. No dice. I much less confidently tell them to restart the desktop and try again, while I'm mentally running through a list of potential causes. Still unsuccessful. I have the tech delete and re-add the shared printer from the user's desktop, and when that didn't work either, go log into the server and verify it was configured properly. The IP checked out, so at this point I was racking my brain for things to try. I remembered an offhanded comment that, during a VLAN migration some months back, some of the printers were kept on VLAN X. As part of yesterday's installation, I noted that the user's office was on VLAN Y. I figured it couldn't hurt, so I changed the VLAN membership on the upstream switch port. I switched the printer from static to DHCP to grab a valid IP, and had the support tech remap the printer on the server. The server was able to recognize the printer, but the user's desktop still couldn't print. Hmm. Ran through some more tests: user could print via another printer. Third party could print to user's printer. I suspect connectivity issues, so I change the printer's switch port to the desktop's and start running ping tests. Here's what I found:
Server-->desktop: 3% packet loss
Server-->printer: 27% packet loss
Desktop-->local gateway: 2% packet loss
Desktop-->printer: 0% packet loss
Ok, this makes no sense. Packet loss on server-->printer and not server-->desktop suggests the printer<>switch link is suspect. However, this is contradicted by the total lack of packet loss between the desktop and printer. Packet loss from the desktop out was higher than desired, but still within tolerance. While trying to piece everything together, I noticed the ping output for the printer was odd. It would have several minutes of solid connectivity, then drop for a bit, and remain sporadic for a while. An interface dropping packets wouldn't behave like that, not for only one device on a link. Without a smoking gun, I temporarily rule out bad links, and check out Layer 2. I log into the upstream switch, and sure enough, the switch is dropping the printer's MAC in the forwarding table. I configure a static MAC on the relevant interface, congratulate myself on a job well done, and put the VLAN membership back to the original setting. I go back up to the user's office, change the printer IP back, and clean up the half dozen or so printer test pages scattered around.
After eating lunch, I let IT support know to remap the printer on the server, and to take care of the user. While I'm idly flipping through the printer test pages, I notice one of them has a duplicate IP message. Weak. I fire off a message to one of our admins to confirm the printer is entitled to that particular IP. In the meantime, I start tracking down the other device. It was in the same subnet as my desktop, so I ping the IP and check ARP to see which MAC it was resolving to. Since it wasn't the printer's MAC, I knew that a continuous ping would keep it in the MAC table for every switch between my desktop and the device. I log in and hop from switch to switch based on the incoming interface. I'm two switches out when I encounter a problem: the switch didn't have a MAC entry for the device. The Dell switches we have at work are pretty lousy, so I try to get around the issue by skipping a link out. Nothing there either, so I chalk that up to Dell being Dell and check out the other half dozen switches serving our users. On many of our Dell switches, you can't do useful things like filtering output by MAC or dumping the output, so I've got to mash space bar for a couple minutes, copy and paste the output into Notepad, then Ctrl-F. Not a trace of the MAC. That leaves our data center, which did have a direct link to the first switch. I don't have access to those devices, so I enlisted the help of my manager. Of course, the switches in the data center are lousy Dell switches, so he winds up having to log into every switch. He finds the MAC on the 20th switch checked. About 75% through this process, our admin replies with "the printer has a reserved lease for that IP," so we couldn't even get around the issue by configuring the printer with another address. We track the MAC to a rack, and find the cable runs into a conduit leading into the raised floor. We pull floor tiles till we can identify the device, a thin client we use to set up tape backups. One that was set to use DHCP. What? I reread the email from the admin and it says "the printer now has a reserved lease for that IP." Ugh. While getting my manager up to speed on the whole ordeal, I told him about the VLAN assignment, and he said it was supposed to be on VLAN X to begin with. Time spent troubleshooting: 5 hours.
This means the root causes were:
Printer with the wrong IP.
And on the wrong VLAN.
Switch dropping entries in the MAC forwarding table.
The problems were exacerbated by:
Incidental packet loss skewing initial troubleshooting.
Me not picking up on how IT support knew there was a D-Link switch in the user's office.
Dell switches being lousy.
Server admins being too clever.
Plan B is often viable. IT support would have connected the printer via USB and called it a day.
Thinking things out helps. You save a lot of heartache by realizing that if A says X, and B and C say otherwise, investigating X is a waste of time. Many people stop at A, because, most of the time, a problem has a single cause.
Networking theory: It works.