Bugzilla – Bug 2744
802.11n/ac with RTS/CTS is crashing for a large number of nodes
Last modified: 2017-06-06 13:57:04 UTC
Created attachment 2847 [details] Modified wifi-tcp.cc file I modified the example file - "/examples/wireless/wifi-tcp.cc" to increase the number of Stations, while having a single AP. I also enabled RTS/CTS to prevent collisions, that may be preventing complete data from being received at the AP. However, for more than 6 Wifi stations, the following error appears: assert failed. cond="remainigAmpduDuration > 0", file=../src/wifi/model/mac-low.cc, line=1725 terminate called without an active exception It seems like this error was not appearing on earlier releases, but is occurring on ns-3.26. I would be glad to hear any suggestions/modifications to counter this bug.
This may have been fixed; can you check behavior on ns-3-dev? I tried your example program with ns-3-dev just now, for nWifi equal to 6 and 10, and did not observe the assert.
Created attachment 2848 [details] Modified file with data= 100,000 bytes
(In reply to Tom Henderson from comment #1) > This may have been fixed; can you check behavior on ns-3-dev? > > I tried your example program with ns-3-dev just now, for nWifi equal to 6 > and 10, and did not observe the assert. Hi Tom, You were right, the program works fine with ns-3-dev. However, when I increased the maxBytes (BulkSendApplication) to 100,000 bytes (it was 1000 bytes in the earlier case), the program shows strange behavior. It works just fine with nWifi (number of Wifi Stations) = 20, 32 but terminates in a SIGSEGV for nWifi = 16, 18, 30, 40. In short, it is showing different behaviour to arbitrary inputs. It may not be completely arbitrary though; different values of nWifi result in different topologies which may the underlying issue, I can't be entirely sure. Normally, I don't think the topology should have this kind of an effect. This is the Seg Fault that occurs: Program received signal SIGSEGV, Segmentation fault. 0x00007ffff662a4d7 in ns3::WifiMacHeader::operator= (this=0x7fffffffbaa0) at ../src/wifi/model/wifi-mac-header.h:80 80 class WifiMacHeader : public Header I have attached the new file named "wifi-tcp-1.cc" in this thread. Thanks, Varun
Hello Sebastien, If you could help me understand the major changes that were made to counter this bug, I could investigate and try to find the root cause of the issue. At the moment, I am going through ns-3-dev, and there are several changes from ns-3.26. If you could just pinpoint the exact changes that helped eliminate this bug, it would help me immensely. Thanks, Varun
I cannot run your attached example on my macOs machine: Waf: Entering directory `/Users/Sebastien/ns-3/ns-3-allinone/ns-3-dev/build' [ 951/2509] Compiling examples/wireless/wifi-tcp.cc [2354/2509] Linking build/examples/wireless/ns3-dev-wifi-tcp-debug Undefined symbols for architecture x86_64: "ns3::FlowMonitor::CheckForLostPackets()", referenced from: _main in wifi-tcp.cc.27.o "ns3::FlowMonitorHelper::InstallAll()", referenced from: _main in wifi-tcp.cc.27.o "ns3::FlowMonitorHelper::GetClassifier()", referenced from: _main in wifi-tcp.cc.27.o "ns3::FlowMonitorHelper::FlowMonitorHelper()", referenced from: _main in wifi-tcp.cc.27.o "ns3::FlowMonitorHelper::~FlowMonitorHelper()", referenced from: _main in wifi-tcp.cc.27.o "ns3::FlowMonitor::GetFlowStats() const", referenced from: _main in wifi-tcp.cc.27.o "ns3::Ipv4FlowClassifier::FindFlow(unsigned int) const", referenced from: _main in wifi-tcp.cc.27.o "typeinfo for ns3::FlowClassifier", referenced from: ns3::Ptr<ns3::Ipv4FlowClassifier> ns3::DynamicCast<ns3::Ipv4FlowClassifier, ns3::FlowClassifier>(ns3::Ptr<ns3::FlowClassifier> const&) in wifi-tcp.cc.27.o "typeinfo for ns3::Ipv4FlowClassifier", referenced from: ns3::Ptr<ns3::Ipv4FlowClassifier> ns3::DynamicCast<ns3::Ipv4FlowClassifier, ns3::FlowClassifier>(ns3::Ptr<ns3::FlowClassifier> const&) in wifi-tcp.cc.27.o ld: symbol(s) not found for architecture x86_64 clang: error: linker command failed with exit code 1 (use -v to see invocation) Tom, is the attached example running fine for you?
You have to modify the wscript: obj = bld.create_ns3_program('wifi-tcp', ['internet', 'mobility', 'wifi', 'applications', 'point-to-point']) obj.source = 'wifi-tcp.cc' Just add the flow-monitor dependency. or... run the scripts from the scratch folder. (In reply to sebastien.deronne from comment #5) > I cannot run your attached example on my macOs machine: > > Waf: Entering directory `/Users/Sebastien/ns-3/ns-3-allinone/ns-3-dev/build' > [ 951/2509] Compiling examples/wireless/wifi-tcp.cc > [2354/2509] Linking build/examples/wireless/ns3-dev-wifi-tcp-debug > Undefined symbols for architecture x86_64: > "ns3::FlowMonitor::CheckForLostPackets()", referenced from: > _main in wifi-tcp.cc.27.o > "ns3::FlowMonitorHelper::InstallAll()", referenced from: > _main in wifi-tcp.cc.27.o > "ns3::FlowMonitorHelper::GetClassifier()", referenced from: > _main in wifi-tcp.cc.27.o > "ns3::FlowMonitorHelper::FlowMonitorHelper()", referenced from: > _main in wifi-tcp.cc.27.o > "ns3::FlowMonitorHelper::~FlowMonitorHelper()", referenced from: > _main in wifi-tcp.cc.27.o > "ns3::FlowMonitor::GetFlowStats() const", referenced from: > _main in wifi-tcp.cc.27.o > "ns3::Ipv4FlowClassifier::FindFlow(unsigned int) const", referenced from: > _main in wifi-tcp.cc.27.o > "typeinfo for ns3::FlowClassifier", referenced from: > ns3::Ptr<ns3::Ipv4FlowClassifier> > ns3::DynamicCast<ns3::Ipv4FlowClassifier, > ns3::FlowClassifier>(ns3::Ptr<ns3::FlowClassifier> const&) in > wifi-tcp.cc.27.o > "typeinfo for ns3::Ipv4FlowClassifier", referenced from: > ns3::Ptr<ns3::Ipv4FlowClassifier> > ns3::DynamicCast<ns3::Ipv4FlowClassifier, > ns3::FlowClassifier>(ns3::Ptr<ns3::FlowClassifier> const&) in > wifi-tcp.cc.27.o > ld: symbol(s) not found for architecture x86_64 > clang: error: linker command failed with exit code 1 (use -v to see > invocation) > > Tom, is the attached example running fine for you?
Tommaso, thanks for the tip, it works :-)
Varun, did you try running it with gdb to check whether this is pointing to TCP or Wifi ?
(In reply to sebastien.deronne from comment #8) > Varun, did you try running it with gdb to check whether this is pointing to > TCP or Wifi ? I just ran a backtrace on the 'n=16' case: #2 0x00007ffff63c56e1 in ns3::MacLow::SendDataAfterCts (this=0x795410, source=..., duration=...) at ../src/wifi/model/mac-low.cc:2100 2100 ForwardDown (packet, &m_currentHdr, m_currentTxVector); (gdb) do #1 0x00007ffff63bd600 in ns3::MacLow::ForwardDown (this=0x795410, packet=..., hdr=0x795588, txVector=...) at ../src/wifi/model/mac-low.cc:1534 1534 newHdr = dequeuedItem->GetHeader (); (gdb) p dequeuedItem $3 = {m_ptr = 0x0}
Tom, thanks. I indeed see dequeuedItem is NULL, but I added an extra trace showing that m_aggregateQueue[GetTid (packet, *hdr)]->GetNPackets () is 7 in this case. Could this be an issue in WifiMacQueue? I know Stefano made some recent changes related to wifi queues, any feedback from Stefano about this is welcome :-)
GetNPackets() is provided by the QueueBase class, which does not take into account that stale items may be in the queue. So, if all the items in the queue are stale, GetNPackets() returns a nonzero value, but Dequeue() will return a null pointer. I made a dirty hack and verified that, if GetNPackets() does not count stale items, the attached program does not crash. I am working on a patch, which I will likely be able to publish tomorrow.
Created attachment 2855 [details] Proposed patch With the attached patch, WifiMacQueue overrides the GetNPackets and GetNBytes methods of the QueueBase class. These methods first remove all the stale items from the queue (thus correctly updating the counters kept by the QueueBase class) and then call their QueueBase version. Also, the HasPackets method is replaced by the IsEmpty method, to be consistent with the method provided by the QueueBase class. With the attached patch, I was unable to make the test program crash.
Created attachment 2858 [details] RTS/CTS is used with the IdealWifiManager with 802.11ac Thank you very much for the fix, Stefano. The earlier attachment works perfectly fine. However, a different problem arises. I modified the attachment to have an IdealWifiManager with the 802.11ac protocol with RTS/CTS enabled. The program terminates in a SIGABRT: terminate called after throwing an instance of 'std::out_of_range' what(): vector::_M_range_check: __n (which is 0) >= this->size() (which is 0) 0x00007fffeb199428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. Through gdb, I found that the following statement in src/wifi/model/mac-low.cc:697 - Ptr<Packet> newPacket = (m_txPackets[GetTid (packet, *hdr)].at (i).packet)->Copy (); has a vector - 'm_txPackets[GetTid (packet, *hdr)]' that is of length 0 and capacity 0. Hence, the program terminates when we try to access '.at(i).packet' in this particular statement. Note that this error does not occur when I use the IdealWifiManager with 802.11ac with RTS/CTS disabled. The program also works just fine with IdealWifiManager/802.11n and RTS/CTS enabled. I have attached a new file 'wifi-tcp-ideal.cc'. Thanks, Varun
Stefano, thanks, it solves the problem :-) Varun, can you please attach the correct script for ideal? The attached one is related to 802.11n and works fine. If I change to 802.11ac in your script, I have an assert "VHT MCS 9 forbidden at 20 MHz when NSS is 1" I want to make sure we use the same script...
Created attachment 2862 [details] RTS/CTS is used with 802.11ac Hi Sebastien, Sorry for uploading the wrong file. I have attached a new file named "RTS_CTS_80211ac.cc" and I actually realized that the bug does not have to do with the Ideal Wifi Manager being used. I was wrong about that. The same bug appears even when the Constant Rate Manager is used with RTS/CTS for VHT modes. In this file, the mode is set to VhtMcs0. Thanks, Varun
Created attachment 2863 [details] Fix issue when CTS timeout occurred for single MPDU I investigated the other issue, this comes from a bug when a CTS timeout occurred for a S-MPDU. Please apply the provided patch in addition to Stefano's patch and let me know whether this works fine for you.
(In reply to sebastien.deronne from comment #16) > Created attachment 2863 [details] > Fix issue when CTS timeout occurred for single MPDU > > I investigated the other issue, this comes from a bug when a CTS timeout > occurred for a S-MPDU. Please apply the provided patch in addition to > Stefano's patch and let me know whether this works fine for you. Hi Sebastian, It's working fine now! Thank you for the fix. Varun
Varun, thanks for the quick feedback.
I'd like to include those fixes in the next release
Created attachment 2865 [details] Avoid calling WifiMacQueue::GetNPackets whenever possible I pushed the patch I prepared to let WifiMacQueue override some methods of the QueueBase class. However, WifiMacQueue::GetNPackets() is quite expensive, since iterates over all the packets in the queue to discard the stale items. I attached another patch which avoids calling this method whenever possible: - in mac-low.cc, some calls to GetNPackets() can be replaced by IsEmpty(), which just iterates over the packets in the queue until a packet which is not stale is found - in WifiMacQueue::Enqueue, the call to WifiMacQueue::GetNPackets() can be replaced by a call to QueueBase::GetNPackets(), which just returns the value of a member variable. Any opinion on this optimization? Shall I push this patch? Thanks.
(In reply to Stefano Avallone from comment #20) > > Any opinion on this optimization? Shall I push this patch? > > Thanks. This looks fine to me (assuming it does not change program output).
Stefano, I am fine you push your improvement. Then I'll push my patch and close this bug.
Fixed in changeset 12924:85efd9b7476b