Bug 2748

Summary: ARP requests are not retransmitted upon loss (in Wifi)
Product: ns-3 Reporter: Varun Reddy <varunamarreddy>
Component: internetAssignee: Tom Henderson <tomh>
Status: RESOLVED INVALID    
Severity: normal CC: ns-bugs
Priority: P3    
Version: ns-3-dev   
Hardware: All   
OS: All   
Attachments: Source Code along with PCAP files

Description Varun Reddy 2017-06-01 18:54:12 UTC
Created attachment 2860 [details]
Source Code along with PCAP files

This is with reference to an earlier bug report - https://www.nsnam.org/bugzilla/﷒0﷓

Although a patch was provided to tackle this issue, I feel that the same problem appears in this particular scenario I'm trying to implement. There is a single AP (IEEE 802.11ac) with 150 stations, each sending 100,000 bytes to the AP through the TCP BulkSendApplication. All nodes are in range of one another. The IdealWifiManager is being used. 

Initially, with the default settings for ARP and Queues, only 109 are able to send their packets successfully. The following nodes (IP Addr, MAC Addr) do not send any data, as can be seen from the attached PCAP file (Access-Point-1.pcap) that these nodes never send out any successful ARP requests to the AP (as a result, there is no data delivery) :

Node 1: 10.0.0.8, 00:00:00:08
Node 2: 10.0.0.16, 00:00:00:10
Node 3: 10.0.0.17, 00:00:00:11
Node 4: 10.0.0.19, 00:00:00:13
Node 5: 10.0.0.24, 00:00:00:18
Node 6: 10.0.0.25, 00:00:00:19
Node 7: 10.0.0.27, 00:00:00:1B
Node 8: 10.0.0.29, 00:00:00:1D
Node 9: 10.0.0.34, 00:00:00:22
Node 10: 10.0.0.40, 00:00:00:28
Node 11: 10.0.0.44, 00:00:00:2B
Node 12: 10.0.0.49, 00:00:00:31
Node 13: 10.0.0.52, 00:00:00:34
Node 14: 10.0.0.53, 00:00:00:35
Node 15: 10.0.0.54, 00:00:00:36
Node 16: 10.0.0.57, 00:00:00:39
Node 17: 10.0.0.64, 00:00:00:40
Node 18: 10.0.0.67, 00:00:00:43
Node 19: 10.0.0.68, 00:00:00:44
Node 20: 10.0.0.69, 00:00:00:45
Node 21: 10.0.0.71, 00:00:00:47
Node 22: 10.0.0.75, 00:00:00:4B
Node 23: 10.0.0.83, 00:00:00:53
Node 24: 10.0.0.91, 00:00:00:5B
Node 25: 10.0.0.93, 00:00:00:5D
Node 26: 10.0.0.97, 00:00:00:61
Node 27: 10.0.0.98, 00:00:00:62
Node 28: 10.0.0.99, 00:00:00:63
Node 29: 10.0.0.107, 00:00:00:6B
Node 30: 10.0.0.109, 00:00:00:6D
Node 31: 10.0.0.111, 00:00:00:6F
Node 32: 10.0.0.112, 00:00:00:70
Node 33: 10.0.0.127, 00:00:00:7F
Node 34: 10.0.0.128, 00:00:00:80
Node 35: 10.0.0.130, 00:00:00:82
Node 36: 10.0.0.135, 00:00:00:87
Node 37: 10.0.0.142, 00:00:00:8E
Node 38: 10.0.0.144, 00:00:00:90
Node 39: 10.0.0.147, 00:00:00:93
Node 40: 10.0.0.150, 00:00:00:96
Node 41: 10.0.0.151, 00:00:00:97

Note that this is not a problem with the Propagation Loss/SNR since all these 41 nodes successfully associate themselves with the AP, but do not send out ARP requests (or they've been lost). As suggested by Tommaso on this post (https://groups.google.com/forum/#!topic/ns-3-users/saEH63E3XmM), I considerably increased the WifiMacQueue size and Max Delay of packets in the Wifi Mac Queue to tackle the congestion being faced at the AP. No matter how large the increase, the following 9 nodes still do not send out ARP requests (Access-Point-1.pcap):

Node 1: 10.0.0.25, 00:00:00:19
Node 2: 10.0.0.29, 00:00:00:1D
Node 3: 10.0.0.34, 00:00:00:22
Node 4: 10.0.0.75, 00:00:00:4B
Node 5: 10.0.0.91, 00:00:00:5B
Node 6: 10.0.0.98, 00:00:00:62
Node 7: 10.0.0.112, 00:00:00:70
Node 8: 10.0.0.127, 00:00:00:7F
Node 9: 10.0.0.144, 00:00:00:90

Furthermore, I went on to increase the ArpCache::PendingQueueSize (default=3) to 100 and ArpCache::MaxRetries (default=3) to 100. These changes had no impact either. 

My feeling is that the retransmission of ARP requests is not being implemented correctly. 

Sorry for the long message, but I wanted to be elaborate about the scenario. 

Thanks, 
Varun
Comment 1 Tom Henderson 2017-06-02 21:48:17 UTC
I believe the problem is that the ArpCache::DeadTimeout default value of 100 seconds is much shorter than your simulation runtime (20 seconds).  Can you try 

Config::SetDefault ("ns3::ArpCache::DeadTimeout", TimeValue (Seconds (1)));

or some suitably small value and see whether it eventually retries successfully?

The behavior is to retry MaxRetries and then, if unsuccessful, go dead for a while, before starting to retry again.  DeadTimeout of 0 might cause it to retry without any interruption (although I haven't tested that).
Comment 2 Varun Reddy 2017-06-03 15:26:54 UTC
(In reply to Tom Henderson from comment #1)
> I believe the problem is that the ArpCache::DeadTimeout default value of 100
> seconds is much shorter than your simulation runtime (20 seconds).  Can you
> try 
> 
> Config::SetDefault ("ns3::ArpCache::DeadTimeout", TimeValue (Seconds (1)));
> 
> or some suitably small value and see whether it eventually retries
> successfully?
> 
> The behavior is to retry MaxRetries and then, if unsuccessful, go dead for a
> while, before starting to retry again.  DeadTimeout of 0 might cause it to
> retry without any interruption (although I haven't tested that).

Hi Tom, 

Yes you're right, it works now. I even populated the ARP cache before data transmission, and the underlying problem was something else; it's not a problem with the ARP. Sorry for the bother. 

Thanks!
Varun