Monitoring for common network problems
Occasionally during a party, I will check error counters to see if we are encountering any cabling issues. To do this, I establish a Telnet or ssh session with the core switch and issue the "show interfaces" command:
hp2824# show interfaces Status and Counters - Port Counters Flow Bcast Port Total Bytes Total Frames Errors Rx Drops Rx Ctrl Limit ------- ------------ ------------ ------------ ------------ ----- ------ 1 27,564 297 0 0 off 0 2 0 0 0 0 off 0 3 0 0 0 0 off 0 [snip]
If you inspect the column marked Errors Rx, it will reflect a number of unrecoverable errors (in this case, none) - a number you want to pay attention to. The Drops Rx column doesn't necessarily mean there's a cabling problem as much as it could be a capacity problem. Switches will begin to drop packets if the queues become backlogged. This will happen when the cable has reached its capacity and a faster connection should be considered.
To determine the actual type of error, just query the switch for more details on the suspect port:
hp2824# show interface 1 Status and Counters - Port Counters for port 1 Name : Table 1 Link Status : Up Bytes Rx : 15,812 Bytes Tx : 25,66 Unicast Rx : 0 Unicast Tx : 0 Bcast/Mcast Rx : 111 Bcast/Mcast Tx : 372 FCS Rx : 0 Drops Rx : 0 Alignment Rx : 0 Collisions Tx : 0 Runts Rx : 0 Late Colln Tx : 0 Giants Rx : 0 Excessive Colln : 0 Total Rx Errors : 0 Deferred Tx : 0
The bold values in the query result above provide details on the types of errors that might occur. For more details on any of these types of errors, you should consult the manual that came with your switch. But generally speaking, FCS and Alignment errors are more likely cabling issues, and Runts & Giants are caused by PCs or switches connected to the port. These devices can be either misconfigured, or have outdated drivers that are causing the errors.
To confirm that the cable is the cause of the errors, run a new cable between the core and table switch. Once you're ready, inform the table that connectivity will drop briefly, remove the old cable and insert the new one at both ends simultaneously for the least amount of downtime. It's useful to now "reset the counters" on the core so you can see if the errors return, i.e.:
hp2824# clear statistics 1






