If you’re a gamer and you use a Wi-Fi connection, you’ve probably run into the problem of ping spikes. That’s when your connection periodically freezes for no apparent reason and you miss your target or your character ends up injured or dead. But while the reason may not be obvious, there are reasons why this happens and they have nothing to do with gremlins.
I asked Wi-Fi expert Dennis Bland [DB] from dB Performance about the causes of high latency spikes and he shared his extensive knowledge of the subject. Dennis has been deep into the bowels of Wi-Fi drivers, so is intimately familiar with what makes Wi-Fi tick.
As you read through the article, "AP" refers to both access points and wireless routers. "STA" refers to Wi-Fi devices.
Let’s first look at the mechanisms in APs and routers that can cause reponse time to spike.
Cause: AP goes off-channel periodically to perform a neighbor AP scan
This is an 802.11k mechanism that is supposed to decrease or eliminate the time a STA takes to find roaming target candidates. The AP scans for other AP beacons and creates an AP neighbor list in case the STA sends an 802.11k neighbor Report Request to the AP. The AP will normally do this periodically (Cisco does it every 120 seconds) and may defer it temporarily during periods of high activity. But eventually it will happen. A ping plot of such a periodic scan might look something like this:
STAs will typically see disruptions of 100 milliseconds for this scenario. Note that it doesn’t matter whether a STA (device) supports 802.11k. The STA doesn’t participate in the scan.
Countermeasure: To avoid this, turn off RRM (Radio Resource Measurements) or 802.11k support in the AP. But this is not a common user setting.
Cause: AP rekeys the WPA pairwise or group key.
This can happen as often as every 10 minutes on some APs, but is more likely every 30 minutes (Aruba/HP) or 60 minutes (Cisco). A rekey event typically takes 50 milliseconds or less, but sometimes it fails and requires the STA(s) to reassociate again, which takes several seconds. Note that the AP does not go off-channel during the rekey process.
[SNB] Can this effect be avoided by using WPA2/AES or WPA3?
[DB] Yes, rekey intervals can be greatly reduced by using WPA2 / AES and especially WPA3. But don’t run WPA2 in mixed-mode (allowing both WPA2/AES and WPA/TKIP). If you have one STA associated to the AP that only supports a TKIP group key, then ALL of the other STAs associated to that AP need to use a TKIP group key as well.This means the AP will typically rekey every 10 minutes.
WPA3 doesn’t have this problem, but it’s not going to be supported by that fridge or thermostat in your home for awhile.
Countermeasure: Use WPA3 if possible, or WPA2 with AES-CCMP for both group and pairwise keys. If you must use WPA2/TKIP, set it up on a separate SSID. You could use 2.4 GHz for all legacy WPA/WPA2 IoT devices, and a second SSID at 5 GHz with WPA2/WPA3 security (and legacy rates disabled) for your serious throughput needs.
Cause: AP moves the STA to another channel
AP moves the STA to another channel, using an 802.11v BSS Transition Management Request. Often the AP will give the STA two seconds to move, then forcefully move it. If the move is to another channel on the same band, the STA might use 802.11r (Fast Transition), which can minimize the latency to less than 50 milliseconds. However if the move is to another band, the STA will perform a full reassociation/reauthentication which can take 150 milliseconds or longer.
In most consumer APs, this event typically happens when the STA somehow associates to the 2.4 GHz band of a dual-band AP and the AP quickly steers it to the 5 GHz band..
[SNB] As Dennis said, APs can "forcefully" move STAs. This can happen as a fallback from an 802.11v Transition Management Request or if the AP detects error rates are high, signal levels are low or the load on a radio is too high. If a STA is cooperative, the move can take 100’s of milliseconds. Uncooperative or "sticky" STAs, however, can take multiple seconds to move or end up disconnected.
Countermeasure: Turn off 802.11v, band steering or "smart connect" features, if you can. If you can’t, assign different SSIDs for each band and delete old connection profiles from your devices. Check device wireless properties to see if you can restrict operation to a single band (unlikely). Make sure your devices are not operating at low signal levels. Turn off "Roaming Assistance" if your router has it. If it allows setting the "assistance" level, set it to -95 dBm or as low as you can, effectively disabling the feature.
Cause: AP has ACS (Automatic Channel Selection) enabled
This is actually a major peeve of mine, as many retail APs have this enabled by default and will automatically select the "quietest" channel for the user. The problem is when the AP detects a certain threshold of interference on the channel, it will perform an off-channel scan and move everybody to a quieter channel. This is extremely disruptive and I’ve seen some (cheaper) wireless routers do this almost once every minute as they ping-pong between channels.
This is especially the case when the interference is spurious and broadband in nature (e.g. microwave oven). Most consumer APs will unceremoniously deauthenticate the STA from the channel, forcing the STA to perform a new channel scan and "find" the SSID again on another channel, which can take 2-5 seconds, or longer. Better APs will use an 802.11v BSS Transition Management Request to indicate the new channel, which reduces the delay to less than a second, assuming the STA supports 802.11v BTM.
Countermeasure: Don’t use automatic channel selection. Set your AP/router channels yourself.
Now let’s look at the client side of things.
Cause: STA goes off-channel to perform an OBSS scan (2.4 GHz band only)
This scan is required to detect overlapping BSSs, i.e. APs on the same or nearby channels, and is usually performed by the STA every 5 minutes. This event usually takes 50 milliseconds or less.
Countermeasure: Since this happens only on 2.4 GHz, use a 5 GHz channel. There is generally no way to disable OBSS on the STA without hacking the code (which I’ve done!).
Cause: STA goes off-channel to perform a background scan
This is usually a periodic scan (scheduled typically every 2 – 5 minutes) under normal signal levels to determine if a better AP is available. If the STA’s signal drops below its roam threshold, the scan can happen as frequently as every 5 seconds! If your signal RSSI doesn’t have a lot of margin above the roam threshold, movement or temporary obstacles may cause nuisance background scans.
This is a very disruptive event, as the WLAN driver often requires 2-5 seconds to perform a full scan!
Countermeasure: Make sure the STA is in a reliable coverage area with a signal level of – 65 dBm or stronger. If possible, disable scheduled scans in the STA, so that the STA only scans when it’s about to lose the connection with the current AP. However, there is value to scheduled scans, as sometimes a stronger AP can become available even when the current AP RSSI is at – 65 dBm.
[SNB] While it’s unlikely you’ll be able to disable scans entirely, you may be able to reduce how often they are done. Check your Wi-Fi adapter’s advanced properties for a "Roam aggressiveness" or similar setting and set it to the least aggressive setting.
Cause: STA is told to perform Beacon Report scan by the AP
This is part of the 802.11k specification, so can happen only if the STA driver supports it. The AP uses this information to determine if it should move the STA to another AP in the ESS, using a 802.11v BSS Transition Management Request. This event normally takes less than 200 milliseconds to complete, depending on the number of channels the STA is instructed to scan.
Countermeasure: Turn off RRM or 802.11k support in either the AP or STA if you can.
Cause: Association retry timeouts caused by channel congestion or spurious interference
These events usually last for only a few seconds, but can last much longer in the case of broadband interference (e.g. leaky microwave in the 2.4 GHz band).
Countermeasure: Find and remove interference sources. Try to find a quieter channel and/or change bands.
As you can imagine, causes can be additive. So ping spikes can be quite frequent and maddeningly random. Many of the above scenarios historically applied to commercial APs in an ESS (Extended Service Set), i.e. multi-AP system. However, with the trend towards Wi-Fi "mesh" networks in the home, multiple AP systems are becoming more common in the home along with 802.11k and 802.11v support.
Note that if you are using a single AP, then 802.11k causes won’t be a problem. But if you have a multi-AP system, including "mesh" systems of any form, i.e. mesh or AP + repeater, the 11k/v technologies that are designed to minimize roam time might be causing a more annoying problem. Finally, remember you can still have 802.11v moves between the bands of a single AP/router, whether it’s dual or "tri-band".
Dennis Bland is President of dB Performance, which specializes in providing high-performance Wi-Fi security solutions. He was a contributor to the IEEE 802.11e Working Group and Wi-Fi Alliance Hotspot 2.0 Task Group and has managed Wi-Fi security software/testing programs at Wind River, Devicescape, and now dB Performance.