People have different needs and expectations from Wi-Fi roaming. The toughest application is VoWiFi, i.e. voice callers using some form of VoIP over Wi-Fi. The rule of thumb for these users is that a roam must be completed in 50 - 100 ms to avoid call interruption.
Most of us, however, would be happy if our devices would move at all to a better connection instead of stubbornly hanging onto the first AP they see. And, oh yeah, it would be nice if when the device moved, it didn't drop our video call or make us have to restart the video we were watching.
Since my script only polls the connected device after each attenuator step, the CSV roam log isn't going to capture sub-second events. But since packet capture runs continuously, the data is there if we need it.
The attenuators were stepped every second, using 2 dB increments to move things along. But due to overhead of the different data capture methods, log entries were about every 1.5 seconds for Android, 2.5 seconds for Pal and 3 seconds for Windows. From what I saw, even these relatively rough capture resolutions were sufficient to catch most devices in the act.
For the first example, the Pal's Roam Threshold was set to -70 dBm, Band Preference to Auto and Roam Target Threshold to -60 dBm. You generally want the target threshold set for a higher signal level than the roam threshold. For example, Apple says macOS uses a -75 dBm roam threshold and +12 dB target difference, i.e. -63 dBm.
Setting the band preference to Auto means Pal will select the strongest signal, which will naturally come from Orbi's 2.4 GHz radio. Neither Orbi—nor any other consumer Wi-Fi system I know of—automatically adjusts transmit power level to optimize roaming.
The plot below shows a round-trip roam session, with callouts marking roam points. This BSSID decoder will come in handy for decoding the following Pal logs, RSSI plots and packet captures.
|2.4 GHz (Channel 6)||5 GHz (Channel 40)|
|Orbi Router (AP1)||0e:02:8e:9f:39:c5||08:02:8e:9f:39:c8
|Orbi Satellite (AP2)||0e:02:8e:9f:3a:f6||08:02:8e:9f:3a:f9
|Pal-245||04:F0:21:28:C6:80 / CompexPt_28:c6:80|
The first roam was quick, occuring within one plot tick (~2.5 sec). The gaps in the plot (where there is no connecting line) shows discontinuities. The first break (~105 sec) is between the active roam and after-roam monitor period. The second break (~130 sec) marks where the first monitor period ends and the return roam starts. The third break at the 210 second mark is where the second after-roam monitor period starts.
Roaming test - Pal w/ -70 threshold
Note that band-steering occurs right at the end of the 10 second after-roam monitor period. The Pal log file below shows RSSI jumping around due to Pal scanning activity. Note that Pal remains associated while it's scanning for a new connection, which should happen so that the connection is not dropped. But since Pal (and any other STA) has only one radio, it can't transmit or receive data during the scan time. Wi-Fi product designers must carefully mange scan time so that data handling isn't significantly affected.
Pal log - roam scan
So what made Pal change bands? The answer to that lies in the packet capture. I use a Wireshark filter to quickly home in on key roaming events.
The fun actually starts back at the highlighted time 101.1, with a BSS Transition Management Request. This is an 802.11v feature that allows an AP to suggest or force a STA to roam, along with a suggested BSSID to roam to. (This 7Signal whitepaper is an excellent resource for understanding 802.11k,v and r features and modern roaming mechanics in general!)
Packet capture - 11v influenced roam
In this case, the currently connected AP—0e:02:8e:9f:3a:f6—is suggesting a move to 08:02:8e:9f:3a:f9, which turns out to belong to the same AP's 5 GHz radio set to Channel 40. Note, however, the roam doesn't happen until around 25 seconds later, when Pal—shown as CompexPt_28:c6:80 in the capture—finally says bye-bye by issuing a disassociation.
What happens in that 25 seconds? The good news is that the STA stays connected. But not shown in the filtered capture are the oh so many probes the Pal makes to the radio it's currently connected to, i.e. 0e:02:8e:9f:3a:f6. Pal first spends around 10 seconds probing the BSSID it's currently connected to on Channel 6, then 2 seconds issuing probes to the same BSSID on Channel 40, then back to Channel 6 for a few more seconds and finally pausing around 6 seconds before disassociating at time 125.6 and finally completing authentication and association to the BSSID suggested by the BSS Transition Management Request.
Note the last move back to AP1, Channel 6 (0e:02:8e:9f:39:c5) is done without an 11v BSS Transition request or disassociation. But the AP immediately disassociates Pal in the last frame shown 183.2, most likely trying to get it to band steer to 5 GHz. Not shown in the above capture is that the AP follows up around 8 seconds later at time 191.3 and again at time 207.3 with BSS Transition Management Requests with 08:02:8e:9f:39:c8 as the candidate AP, which is AP1's 5 GHz radio. The capture ends before that roam is made.
Now let's see what happens with a "sticky" STA. To emulate one, I set the Pal's roaming and target thresholds to -95 dBm. As noted earlier, I couldn't disable Pal's 11v feature.
This time, the roam plot shows Pal doesn't move from its initial Channel 6 connection, which is the expected behavior since the RSSI doesn't fall to the -95 dBm setting. But why does a band-steer—on the same AP—occur toward the end of the run?
Roaming test - Pal w/ -95 threshold
The packet capture shows Orbi made many unsuccessfully attempts to get our sticky Pal to move. It first disassociated Pal (time 20.2), which promptly connected right back. Orbi next tried to band steer Pal by issuing an 11v BSS Transition Management Request, suggesting a move to the same AP's 5 GHz radio (08:02:8e:9f:39:c8) @ 27.2 and again at 48.3. Then, for some reason, it tries to disassociate Pal from the same radio it just tried to steer it to! Orbi then tries one more 11v request @ 64.3 before giving up for awhile.
Packet capture - sticky STA
The action picks up again @ 153.2 with another 11v BSS Transition suggestion, once again to 08:02:8e:9f:39:c8, followed by a disassociation @ 184.3. This last attempt finally succeeds in making Pal move @ 199.5. I would not count this as a successful roam!
If you're wondering about that Probe Request @ 114.7, so am I. The source MAC address (16:02:8e:9f:3a:f6) is the same as the Orbi Satellite's 2.4 GHz radio (0e:02:8e:9f:3a:f6), except for the first octet. I suspect it has something to do with Orbi's mesh management.