Bug 2524 - BlockAck incorrect behavior in presence of multiple collisions
BlockAck incorrect behavior in presence of multiple collisions
Status: RESOLVED INVALID
Product: ns-3
Classification: Unclassified
Component: wifi
ns-3.26
All All
: P3 normal
Assigned To: sebastien.deronne
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2016-10-18 11:14 UTC by Hany
Modified: 2016-11-09 11:25 UTC (History)
3 users (show)

See Also:


Attachments
Simulation Script + PCAPs + Application Helper APIs (650.34 KB, application/x-bzip)
2016-10-18 11:14 UTC, Hany
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hany 2016-10-18 11:14:32 UTC
Created attachment 2622 [details]
Simulation Script + PCAPs + Application Helper APIs

I have the attached simulation scenario which consists of one AP and 3 STAs. all the nodes uses 802.11n PHY Standard with two level aggregation enabled to the maximum. All STAs have OnOff transmitter using UDP as transport layer. The OnOff application is always in the on state transmitting to the Sink Application located at the AP. 

When I print the throughput using a window of 100 ms, I find a contiguous gaps of zero throughput. When I inspected the PCAP file of the AP, I found that the AP with IP address of 10.0.0.1 misses frame with Seq=1521 from STA (2) with IP address of 10.0.0.3. The AP informs STA (2) that it missed that frame however the STA does not retransmit that frame which I believe it greats a problem in the reorder buffer.

Note: I run the script using latest ns3-dev, however I modified the helper of the onoff and packetsink applications and included the default constructor. Everything can be found in the attached compressed file.
Comment 1 sebastien.deronne 2016-10-26 16:57:14 UTC
There is indeed an issue here.
However, I do not see any frame with Seq=1521, did you attach the correct script?
Comment 2 Hany 2016-10-26 17:03:28 UTC
I have attached the correct script and the problem can be seen also in the attached Traces in STA-2-0.pcap file, you can see that the frame with seq=1521 is sent by the station (STA 2) but not received by the AP (You can't find it in the AccessPoint PCAP file).
Comment 3 sebastien.deronne 2016-11-01 06:54:11 UTC
(In reply to Hany from comment #2)
> I have attached the correct script and the problem can be seen also in the
> attached Traces in STA-2-0.pcap file, you can see that the frame with
> seq=1521 is sent by the station (STA 2) but not received by the AP (You
> can't find it in the AccessPoint PCAP file).

I do see this in your zip file, but this is not seen in the simulation run.
Your pcap is much longer than the simulation duration, and seq 1521 is seen around 6s of simulation, while your script ends at 4s.
So this simulation script is not the one that generates those pcap traces.

I indeed see something strange due to A-MPDU enabled, much earlier than 6s so frame 1521 is certainly not the (only) problem.

It won't be that obvious to debug, we need to first find out the first frame that experiences an unexpected behavior (and IMO it is not frame 1521).
Comment 4 sebastien.deronne 2016-11-01 07:45:10 UTC
I am checking traces and everything looks fine so far.

There is actually an explanation you would see some low throughput during some time intervals. Since you select a very small window (100ms), there might have some intervals without received packets. 

Imagine you send from frame 1 to frame 16 and you loose frame seq=10, it will only forward up frames 1 to 9, and will wait to get frame 10 (or for BAR if max number of retries is reached) to continue forwarding up. This might corresponds to a gap of null throughput. 

Without A-MPDU, it will also retry the missed frame several times, but since A-MPDU can encapsulate several frames in the retry, it can take more time, and so the probability you have small intervals with low or null throughput is higher when A-MPDU is enabled, but in average A-MPDU should be more efficient. If you try with larger intervals (e.g. 500ms), you won't see such behavior.

I let you some time to check whether you can detect a bug anyhow (for example, check for each station the flow of received frames with the sequence numbers, and check reordering is done properly). If you do not find a bug, this will have to be rejected since there is an explanation for this behavior.
Comment 5 Hany 2016-11-02 09:17:13 UTC
Thanks Sebastien for your explanation. I agree with you regarding why I might see some gaps because of the re-order buffer. However, the gaps are really big and they reach around 800ms which is a lot.

I just did the simulations again using the same script running on ns-3.26. I am still able to see the same problem of missing packet with seq=1521 from the STA (2) and as you mentioned it is not the first packet that causes the prbolem. But the strange cause in this case is that the packet is never re-transmitted at all. You can see the AP is reporting in the BlockAck that it is missing Packet with Seq=1521, however the station never transmits that packet.
Comment 6 sebastien.deronne 2016-11-02 09:33:44 UTC
(In reply to Hany from comment #5)
> Thanks Sebastien for your explanation. I agree with you regarding why I
> might see some gaps because of the re-order buffer. However, the gaps are
> really big and they reach around 800ms which is a lot.

Indeed, but I have not seen any gaps of 800ms with no traffic in your script.

> 
> I just did the simulations again using the same script running on ns-3.26. I
> am still able to see the same problem of missing packet with seq=1521 from
> the STA (2) and as you mentioned it is not the first packet that causes the
> prbolem. But the strange cause in this case is that the packet is never
> re-transmitted at all. You can see the AP is reporting in the BlockAck that
> it is missing Packet with Seq=1521, however the station never transmits that
> packet.

Can you check in the debug traces why it is not re-transmitted?
Comment 7 Hany 2016-11-02 15:23:38 UTC
I am still trying to check out why some packets are not re-transmitted but I have not figured out yet. Here is a sample example of the output of script, you can see that we have a window of 700ms from 5.4 to 6 without any throughput for the second station.

Time	Node1	Node2	Node3
5	32.9728	70.0672	18.8416
5.1	18.8416	13.5424	4.7104
5.2	4.7104	0	18.8416
5.3	4.7104	46.5152	0
5.4	0	0	0
5.5	23.552	0	42.3936
5.6	0	0	0
5.7	0	0	0
5.8	0	0	0
5.9	80.0768	0	94.208
6	0	0	0
6.1	51.8144	183.706	32.9728
6.2	9.4208	14.1312	9.4208
Comment 8 Ioannis 2016-11-03 08:21:06 UTC
(In reply to Hany from comment #7)
> I am still trying to check out why some packets are not re-transmitted but I
> have not figured out yet. Here is a sample example of the output of script,
> you can see that we have a window of 700ms from 5.4 to 6 without any
> throughput for the second station.
> 
> Time	Node1	Node2	Node3
> 5	32.9728	70.0672	18.8416
> 5.1	18.8416	13.5424	4.7104
> 5.2	4.7104	0	18.8416
> 5.3	4.7104	46.5152	0
> 5.4	0	0	0
> 5.5	23.552	0	42.3936
> 5.6	0	0	0
> 5.7	0	0	0
> 5.8	0	0	0
> 5.9	80.0768	0	94.208
> 6	0	0	0
> 6.1	51.8144	183.706	32.9728
> 6.2	9.4208	14.1312	9.4208

Hi Hany,

could you please check whether the packets are dropped from the MAC queue due to the dot11EDCATableMSDULifetime (i.e. SeqNum 1521)?

I think that you have placed Node2 in between Node1 & Node3, however you have only one AP and very low CCA thresholds, so I guess we can rule out hidden/exposed nodes problem. Now, I would suggest you to calculate throughput in Mbps or at least to have the metrics. You have it in 100kbps and is misleading.  
You can also check whether any node transmits anything during that period, if so, why the throughput is zero (i.e. collisions)?
Comment 9 sebastien.deronne 2016-11-03 11:37:35 UTC
Ioannis has a point, maybe your frames reach the maximum lifetime.

I also suspect high collision conditions, and I think you get what is expected.
The only solution here is to look deeply in the traces to check whether each operation is expected or not, and/or add more checks/asserts/whatever you want to detect a bug.
Comment 10 Hany 2016-11-04 09:40:31 UTC
Hi Ioannis and Sebastien,

I could confirm that the problem is due to that fact the A-MPDU packets are residing in MacQueue inside BlockAckManager for a period longer than dot11EDCATableMSDULifetime. I am still doing more simulations and I will confirm again if the problem has gone.
Comment 11 sebastien.deronne 2016-11-08 16:23:24 UTC
(In reply to Hany from comment #10)
> Hi Ioannis and Sebastien,
> 
> I could confirm that the problem is due to that fact the A-MPDU packets are
> residing in MacQueue inside BlockAckManager for a period longer than
> dot11EDCATableMSDULifetime. I am still doing more simulations and I will
> confirm again if the problem has gone.

Hany, can we reject this bug?
Comment 12 Hany 2016-11-09 10:46:58 UTC
(In reply to sebastien.deronne from comment #11)
> (In reply to Hany from comment #10)
> > Hi Ioannis and Sebastien,
> > 
> > I could confirm that the problem is due to that fact the A-MPDU packets are
> > residing in MacQueue inside BlockAckManager for a period longer than
> > dot11EDCATableMSDULifetime. I am still doing more simulations and I will
> > confirm again if the problem has gone.
> 
> Hany, can we reject this bug?

Yes, Sebastien let us close it.
Comment 13 sebastien.deronne 2016-11-09 11:25:08 UTC
Rejected: not a bug