The BIG GLITCH manifested as a routing problem at the point in the district central office where all of the new switches link up with the "legacy" part of the network. After a couple hours of troubleshooting it became apparent that we could have the North half of the upgraded network and the "legacy" network talking, or the North and South halves of the network talking, but not all three at once. (Lemme tell you, it is really demoralizing to see routes start vanishing from your route table it's like looking at a map and seeing roads disappear off of it in front of your eyes).
The fix was pretty direct, once I thought of it (and eerily similar to what we had to do to stabilize the ATM network back in the spring of 2000 when we made a big ATM routing change). We're running OSPF (Open Shortest Path First) as the routing protocol that all the switches and routers use to tell each other about the layout of the network. There was some kind of problem with how the routers at the joining point were sharing information. The solution was to create a new joining point between the upgraded network and the "legacy" network. The new join is a subnet with just two routers on it (picture a single bridge joining two islands) using the RIP v2 routing protocol. This effectively cleaved one big OSPF network into two smaller OSPF networks with a lingua Franca shared by two routers to make the two share information back and forth.
After some more fiddling to get traffic to go to the correct Internet connection, things have settled down. At the end of the day Pat and his team had finished upgrading the last site (Central Library), and Kevin and Gary had diagnosed the link problem between Sanchez and Park Library. One of the plastic connectors (called a bulkhead) in the fiber-optic connection box where we plug into the fiber from the street. Something was just amiss with it. When they tested the fibers with the calibrated light source and optical power meter they found that one strand on the run from Park to Sanchez had a 4 dB loss, while the other side had something around a 17 dB loss. Long-haul gigabit Ethernet adapters can tolerate a 8 dB loss, and as the decibel scale is logarithmic, the difference between -8dB and -17dB is a signal that's, um, 1/1,000,000,000th as strong. Kevin scrounged a replacement bulkhead, and both strands of fiber tested out at -4dB. That was more than good enough: the link has been running without errors all evening.
Of course, the real test will come Tuesday morning when everyone is back at work and all the kids are back in school. It's been a grueling five months, with this project starting right on the heels of school construction (which is still ongoing). Perhaps that's why I'm not feeling elated. We did it. It works. It needs some tweaking for performance reasons, but it's working. Maybe I'm gun shy about celebrating because I fear that there is another shoe left to drop.
Best line of the day (if I may be so self-congratulatory) -- me to Ron at about 12:30 shortly after we fixed the BIG GLITCH: "The cold knot of dread in my stomach has been replaced by a ravening hunger. We need lunch!"