Ping Pongs – Part 2: The Lower Layer Cover Up

In my previous blog, Ping Really Does Pong, I wrote about the dangers of using ping due to off-channel scanning by both the client and the access point. In this post, I will dig a bit deeper, and illustrate what else your ping tests are covering up.

As mentioned previously, Pinging over wireless is a really good tool to assess basic connectivity, but not performance. This is a big change to a traditional switched full-duplex wired network where you have dedicated medium for send and receive. Wireless is a half-duplex shared medium, where contention, noise, and interference mean the delivery of frames is not guaranteed, and the lower layers of the OSI model have a large overhead to ensure delivery of those frames.

It is important to note that we are talking about frames, wireless operates at the lower 2 layers of the OSI model. Ping (or ICMP) operates at layer 3 (although there is some debate around this, the key thing is it operates above layer 2), and therefore covers up whats happening at the lower layers.

I tried to come up with a good analogy but found it difficult, I guess it’s like assessing the quality of a road surface by the route chosen by your Sat Nav – yes, poor quality roads can slow traffic down and influence your route, but on a normal day to day scenario your Sat Nav is unaware of the potholes, the material of the road, quality of the road markings, etcetera. This is kind of whats happening – wireless is a bit of the road network.

Anyway, I digress, I guess the best place to start is almost where we left off – and look at a roam, both between PHYs and between Channels. Using a Windows Tablet and Nigel Bowden’s Windows WLAN Data Powershell Script, with a simultaneous ping to my gateway, I went for a walk.

For reasons unknown, the Windows laptop first connected to Channel 11 on 2.4GHz, and response time was a healthy 1-2ms:

Time Radio Type Channel Rx Rate (Mbps) Tx Rate (Mbps) RSSI SSID Response Time
20:42:52 802.11n

1

144

144

-50

NovaNet 1ms
20:42:54 802.11n

1

144

144

-50

NovaNet 2ms
20:42:55 802.11n

1

144

144

-50

NovaNet 2ms
20:42:56 802.11n

1

144

144

-50

NovaNet 2ms
20:42:57 802.11n

1

144

144

-50

NovaNet 2ms

After a few seconds, without moving, the client decided to roam to channel 52. One thing you will notice is that the response time didn’t increase that much, which if you have read my previous blog you would expect due to the client assessing its environment – however in my lab I have the protocols 802.11K and 802.11V enabled – I will cover more on this in a later blog, however, for the purpose of this article it means the client has already been supplied with Access Point neighbour and topology information, and thus the off-channel scanning is greatly reduced.

Time Radio Type Channel RX Rate (Mbps) Tx Rate (Mbps) RSSI SSID Response Time
20:43:19 802.11n

1

144

144

-50

NovaNet 2ms
20:43:20 802.11n

1

144

144

-50

NovaNet 2ms
20:43:21 802.11n

1

144

144

-50

NovaNet 11ms
20:43:22 802.11ac

52

360

360

-50

NovaNet 2ms
20:43:23 802.11ac

52

360

360

-50

NovaNet 6ms
20:43:25 802.11ac

52

360

360

-50

NovaNet 5ms

Other than a very small spike while the client assessed the surroundings, you wouldn’t have known there was a roam – not only was this between channels, this roam was between frequencies – it moved from 2.4GHz to 5GHz.

As I started my journey, and my client moved closer to the magic roam point (around the -75dBm mark for this client), the response time didn’t change. One important thing to note at this point is that Windows does not give you access to MCS data, unlike on MacOS (Made even easier with some of Adrian Granado’s tools). However, you will see the RX and TX rate has started to drop considerably, and during the 5 minutes of capturing the data my MCS rate changed 16 times,

Time Radio Type Channel RX Rate (Mbps) Tx Rate (Mbps) RSSI SSID Response Time
20:44:58 802.11ac

52

216

216

-74.5

NovaNet 4ms
20:44:59 802.11ac

52

216

216

-74.5

NovaNet 2ms
20:45:00 802.11ac

52

216

216

-74.5

NovaNet 3ms
20:45:01 802.11ac

52

216

216

-74.5

NovaNet 2ms
20:45:02 802.11ac

52

216

216

-74.5

NovaNet 2ms

As you can see, at -74.5, right before the client roams, there is absolutely no indication via ping – if I was assessing this client via ping alone I would give it a 10/10.

So, onto the next roam – this roam happened at 5GHz, from channel 52 to channel 100. This roam took approximately 300ms, long enough that the script I was running thought I was disassociated from the network – so whilst correlating the data can make it really obvious, other events at layer 2 can make it less obvious.

Time Radio Type Channel RX Rate (Mbps) Tx Rate (Mbps) RSSI SSID Response Time
20:45:18 802.11ac

52

162

162

-74.5

NovaNet 6ms
20:45:19 802.11ac

52

162

162

-74.5

NovaNet 7ms
20:45:20 802.11ac

52

162

162

-74.5

NovaNet 7ms
20:45:21 802.11ac

52

162

162

-74.5

NovaNet 7ms
20:45:23 802.11ac

0

0

0

0

305ms
20:45:24 802.11ac

100

360

360

-56.5

NovaNet 7ms
20:45:25 802.11ac

100

360

360

-56.5

NovaNet 6ms
20:45:26 802.11ac

100

360

360

-56.5

NovaNet 6ms

For my next test, I turned on a microwave. My microwave absolutely hates channel 13, and as I’m in the UK I can use channel 13 – so I put together an RF profile only advertising this channel on 2 APs and connected a client.

The ping results were as expected:

Reply from 192.168.1.254: bytes=32 time=4ms TTL=64
Reply from 192.168.1.254: bytes=32 time=161ms TTL=64
Reply from 192.168.1.254: bytes=32 time=114ms TTL=64
Reply from 192.168.1.254: bytes=32 time=279ms TTL=64
Reply from 192.168.1.254: bytes=32 time=448ms TTL=64
Reply from 192.168.1.254: bytes=32 time=3ms TTL=64
Reply from 192.168.1.254: bytes=32 time=3ms TTL=64
Reply from 192.168.1.254: bytes=32 time=1ms TTL=64
Reply from 192.168.1.254: bytes=32 time=63ms TTL=64
Reply from 192.168.1.254: bytes=32 time=12ms TTL=64
Reply from 192.168.1.254: bytes=32 time=234ms TTL=64
Reply from 192.168.1.254: bytes=32 time=193ms TTL=64
Reply from 192.168.1.254: bytes=32 time=236ms TTL=64
Reply from 192.168.1.254: bytes=32 time=430ms TTL=64
Reply from 192.168.1.254: bytes=32 time=20ms TTL=64

Now, this is an incredibly important point: During the period my client saw c75% retries. Why is it important? Well, it didn’t drop a single packet. Looping nicely back into the title of my blog, on an 802.11 network, at layer 2 if a node does not receive an ACK to acknowledge receipt of a frame, it will retransmit; as the frame has been retransmitted the packet is still delivered. So, despite over 75% of frames not making it to their destination on the first attempt, every single ping counted; Ping is masking over the layer 2 issues.

There’s a heck of a lot going on in this scenario, the channel utilisation was hitting around 65% (Channel utilisation being a layer 1 measurement), and the microwave can be heard at as high as -20dBm. 802.11 Energy Detect kicks in at -62dBm – the client backs off if it detects energy (or bitrate errors) at -62dBm or greater. Because there was retires the MCS rate would have continuously changed, the SNR would have decreased, and it’s even possible that the client was looking for other access points to roam to.. and by pinging, you wouldn’t have known the difference. The amazing thing is, wireless still worked! The protocol handled it perfectly.

ezgif.com-crop.gif
Captured with Metageek Chanalyzer

Anyway, you might be thinking that it was quite obvious there was an issue due to the sporadic ping response times – anyway, to cover that here is a an example of a ping whilst I’ve been streaming some 4k content off Netflix – spot the difference!

Reply from 192.168.1.254: bytes=32 time=161ms TTL=64
Reply from 192.168.1.254: bytes=32 time=114ms TTL=64
Reply from 192.168.1.254: bytes=32 time=279ms TTL=64
Reply from 192.168.1.254: bytes=32 time=448ms TTL=64
Reply from 192.168.1.254: bytes=32 time=3ms TTL=64
Reply from 192.168.1.254: bytes=32 time=7ms TTL=64
Reply from 192.168.1.254: bytes=32 time=103ms TTL=64
Reply from 192.168.1.254: bytes=32 time=63ms TTL=64
Reply from 192.168.1.254: bytes=32 time=222ms TTL=64

Thats all for now, I may end up doing a Ping Pongs Part 3 with further details such as Packet Analysis, until then, thanks for reading!

If you spot any mistakes, please let me know!