Bug 2313

Summary: Assert failed when using aggregation and RTS/CTS
Product: ns-3 Reporter: Matías Richart <matis18>
Component: wifiAssignee: sebastien.deronne
Status: RESOLVED FIXED    
Severity: major CC: krotov, ns-bugs
Priority: P5    
Version: ns-3.25   
Hardware: All   
OS: All   
See Also: https://www.nsnam.org/bugzilla/show_bug.cgi?id=2350
Attachments: Modification for reproducing the bug
proposed fix to solve this bug
patch to fix
Patch to fix similar bug for the case without RTS/CTS
Refactoring patch
Extract TID from BlockAck action frames
Mark GetCurrentTid const
Use PeekHeader instead of RemoveHeader

Description Matías Richart 2016-02-28 07:51:20 UTC
Created attachment 2308 [details]
Modification for reproducing the bug

The assert that fails is:
assert failed. cond="m_lastReceivedHdr.IsQosData ()", file=../src/wifi/model/mac-low.cc, line=853

With some debugging I found that the problem appears when an entire A-MPDU reception fails and previously a RTS/CTS was interchanged.

When the recipient receives a faulty Block Ack Request after an entire A-MPDU transmission was unsuccessfully received, it checks if the last received header is QoS, which was the RTS and the asserts fail.

I found that the problem may be because m_receivedAtLeastOneMpdu was not correctly updated. I dig a bit in my example and found that the all MPDUs of the A-MPDU that fail were association requests with destination address 88:02:30:00:00:00, so when the last MDPU arrives, the function ReceiveError give false at this comparison if (hdr.GetAddr1 () != m_self) and do not put to false m_receivedAtLeastOneMpdu

I attach a small modification to wifi-aggregation example to reproduce the problem.`
Comment 1 sebastien.deronne 2016-02-28 11:49:16 UTC
Created attachment 2309 [details]
proposed fix to solve this bug

The problem was that we did not remove the Ampdu subframe header in ReceiveError, resulting in an incorrect address field value returned by GetAddr1.
The patch is removing the Ampdu subframe header before peeking the WifiMacHeader.
Comment 2 sebastien.deronne 2016-02-28 11:50:46 UTC
Important fix to be delivered, so setting to LAST CALL
Comment 3 Alexander Krotov 2016-03-01 04:49:51 UTC
(In reply to sebastien.deronne from comment #1)
> Created attachment 2309 [details]
> proposed fix to solve this bug

I also have scenario affected by this bug and the patch fixes it in my case.

Why not commit it without waiting for release?
Comment 4 sebastien.deronne 2016-03-01 05:41:36 UTC
(In reply to Alexander Krotov from comment #3)
> (In reply to sebastien.deronne from comment #1)
> > Created attachment 2309 [details]
> > proposed fix to solve this bug
> 
> I also have scenario affected by this bug and the patch fixes it in my case.
> 
> Why not commit it without waiting for release?

We do not wait for release, we will push it normally today.
I wait a little bit since Tom told me yesterday he saw some issues with the patches that are planned to be delivered for the release, so I will first check what is causing the issue.
Comment 5 Matías Richart 2016-03-01 07:01:12 UTC
+1
Bear in mind that fix triggers bug 2315 in some cases.
Works for me if both fixes are applied.
Comment 6 sebastien.deronne 2016-03-01 14:44:27 UTC
pushed in changeset 11945:96b721d35158
Comment 7 Matías Richart 2016-03-07 13:25:44 UTC
I reopen this bug because I'm getting an assert fail when a CTS is missed.
msg="Current packet is not Qos Data", file=../src/wifi/model/edca-txop-n.cc, line=670

The problem happens when the method MissedCts is invoked.
Comment 8 sebastien.deronne 2016-03-07 13:56:35 UTC
I assume your scenario considers hidden nodes?
I wanna try first to reproduce.
Comment 9 sebastien.deronne 2016-03-07 16:34:28 UTC
Matias, could you please attach a test case?
I could not reproduce the bug so far.
Comment 10 Matías Richart 2016-03-07 17:20:27 UTC
(In reply to sebastien.deronne from comment #9)
> Matias, could you please attach a test case?
> I could not reproduce the bug so far.

Sorry I was trying to find an easy test case but I couldn't.

It appears when testing new minstrel-ht code with rate-adaptation-distance from here: https://codereview.appspot.com/283960043/

Just run ./waf --run "rate-adaptation-distance --STA1_x=100"
Comment 11 Matías Richart 2016-03-07 18:48:14 UTC
I found that the asserts fails because it is comparing against a BACK Request header.
-A BACK Req is sent as part of an A-MPDU
-The A-MPDU needs RTS/CTS 
-The CTS is never received.
-EdcaTxopN::m_currentHdr is the header of the BACK Request and QosData is expected.
Comment 12 sebastien.deronne 2016-03-09 14:48:58 UTC
Created attachment 2336 [details]
patch to fix
Comment 13 sebastien.deronne 2016-03-09 16:02:34 UTC
fixed in changeset 12002:e6f2231c1a96
Comment 14 Alexander Krotov 2016-04-05 10:52:50 UTC
Created attachment 2371 [details]
Patch to fix similar bug for the case without RTS/CTS

The same bugfix should be applied to MissedAck case. I run some experiments with different RTS threshold and NS_FATAL_ERROR is triggered sometimes. Attached is a patch that fixes the issue.

By the way, functions EdcaTxopN::MissedAck and EdcaTxopN::MissedCts are almost identical, but have small differences. For example, MissedCts calls m_low->FlushAggregateQueue (), but MissedAck does not. Is it correct? The common part should probably be factored out.
Comment 15 Alexander Krotov 2016-04-05 11:50:17 UTC
Created attachment 2372 [details]
Refactoring patch

This patch fixes the case without RTS/CTS and also fixes the same bug for BlockAck *response*. Common code is moved to the function EdcaTxopN::GetCurrentTid ().
Comment 16 Alexander Krotov 2016-04-05 12:16:01 UTC
Also, with this 
(In reply to Alexander Krotov from comment #15)
> Created attachment 2372 [details]
> Refactoring patch

With this patch applied I also sometimes get NS_FATAL_ERROR for this type of packets:
ns3::WifiActionHeader (category=BlockAck, value=PeerLinkOpen) ns3::MgtAddBaResponseHeader (status code=success)

Probably need to extract Tid for it too.
Comment 17 Alexander Krotov 2016-04-06 05:21:06 UTC
Created attachment 2373 [details]
Extract TID from BlockAck action frames

This patch should be applied on top of attachment 2372 [details].
Comment 18 sebastien.deronne 2016-04-16 03:23:26 UTC
Could you include a test case showing that your patch indeed fixes the issue?

It looks quite ok though, but I suggest to use peekheader iso removeheader, and to sign the new GetCurrentTid fct with const.
Comment 19 Alexander Krotov 2016-04-16 12:02:19 UTC
Created attachment 2387 [details]
Mark GetCurrentTid const

3rd patch in a patch series
Comment 20 Alexander Krotov 2016-04-16 12:02:54 UTC
Created attachment 2388 [details]
Use PeekHeader instead of RemoveHeader

4th patch
Comment 21 sebastien.deronne 2016-04-16 12:08:57 UTC
It looks good now, just also packet->RemoveHeader (actionHdr) should be replaced with packet->PeekHeader (actionHdr).
Comment 22 Alexander Krotov 2016-04-16 12:13:03 UTC
(In reply to sebastien.deronne from comment #21)
> It looks good now, just also packet->RemoveHeader (actionHdr) should be
> replaced with packet->PeekHeader (actionHdr).

It would be wrong, I need to look at MgtAddBaResponseHeader that is below WifiActionHeader, so I need to remove WifiActionHeader first.
Comment 23 sebastien.deronne 2016-04-16 12:21:14 UTC
Ok, got it, but we thus need to be sure the header is not expected to be retrieved elsewhere.
Comment 24 Alexander Krotov 2016-04-16 12:24:50 UTC
(In reply to sebastien.deronne from comment #23)
> Ok, got it, but we thus need to be sure the header is not expected to be
> retrieved elsewhere.

I copy m_currentPacket before removing header, so m_currentPacket is not modified.
Comment 25 sebastien.deronne 2016-04-16 12:28:46 UTC
Oh indeed, I had missed it, as you copy it you can safely call the remove function iso the peek function, so please ignore my previous comment.
Comment 26 sebastien.deronne 2016-04-16 12:29:15 UTC
BTW I received a failing test case (different scenario than the one tracked by this bug) and your patch does solve the problem (but other issues pop up still). 

IMO this bug should not have been reopened, your patch does address a different root cause even though the triggered assert is the same. Your patch fixes issues when management frames such as ADDBA or DELBA frames are exchanged, which might collide in very specific conditions (thus it should be treated as a normal or minor issue) and the TID is then not properly retrieved. I agree this was not clear from the beginning, but investigation tell us more now :-) Anyway, I agree to deliver the proposed patch in ns-3-dev, but thus maybe we should track it in another thread.
Comment 27 sebastien.deronne 2016-04-20 09:48:53 UTC
Opened new bug 2380 for the issue reported by Alexander.