On the school side of the network I have three routers that handle all of the data traffic for the 40+ schools. One routes for the north end, one of the middle, and one for the south. At little after 9:30 this morning the north end router ground to a halt. It's buffer pool was depleted by a storm of traffic, and the poor thing did the electronic equivalent of throwing up its hands.
On the up-side, my two engineers and I were in the office at our desks when things went south. The down side is that we were all in the office because we were going to have a conference call at 10:00 with IT staff from the University of Arkansas about their campus-wide wireless deployment. We had to punt the call, but they graciously agreed to another call tomorrow afternoon.
We initially suspected a condition known as a bridge loop, where someone creates a loop in part of the network resulting in an ever-increasing maelstrom of traffic on the looped segment. This will drag a router down as it tries to make sense of the hail of repeated data. These are relatively easy to troubleshoot. For lack of finer tools, we shut the schools off and turn them back on one by one. When the router freaks, you know you've found the offender. You turn that school off again, turn everyone else back on, and then go on site to hunt down the offending connection.
This one wasn't so clear cut. We'd bring a site back on line, and the router's buffer utilization would jump, but it wouldn't use them all. Traffic would get jittery, and then it would settle. We finally ID'd Hartford Public High School (HPHS) as the worst offender and left them shut down. With everybody else turned back on, the north end router was laboring but staying afloat.
Next up was setting up routing for HPHS on another router (one with more resources, and a free Ethernet card, so if we crashed the card we wouldn't be taking out anyone else), and getting a laptop with Sniffer running. I ran a traffic capture for 18 seconds and got a good big sample. Looking at the capture, the problem jumped right out ... the finger prints (and lots of them) of PCs infected with some kind of worm trying to find other PCs to infect (specifically, a rain of TCP SYN frames going out to semi-random IP addresses probing for new victims). I don't know which worm yet the desktop guys will have to figure that out when they get on site. Based on the behavior I saw, HPHS was only the worst of several we'll be doing more Sniffer captures tomorrow looking at other schools.
The attack traffic created a Denial Of Service (DOS) effect in the north end router. By spitting out SYN frames to semi-random addresses, the infected PCs caused the router to choke on trying to tell them all "you can't get there from here" as 99.9% of the frames were addressed to IP addresses that don't exist on our network. The ten or so PCs at HPHS that we ID'd from our 18 second capture were generating over a thousand SYNs a second.
I can only hope that this worm is an older one that our PCs that are getting automatic updates are immune from (we have a large population of PCs out in the schools that are not getting updates, either automatically or manually). If this is a brand new worm, we could be in for a really rough day when everyone turns their classroom PCs back on tomorrow.