Bug 1311 - Scheduling Packet Send with application
Scheduling Packet Send with application
Status: RESOLVED INVALID
Product: ns-3
Classification: Unclassified
Component: applications
ns-3.12
PC Linux
: P5 normal
Assigned To: George Riley
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-12-11 11:10 UTC by frmanwi
Modified: 2011-12-20 18:23 UTC (History)
3 users (show)

See Also:


Attachments
Test code + Edited Mac layer of 802.11 for stats purpose (95.98 KB, application/gzip)
2011-12-11 11:10 UTC, frmanwi
Details

Note You need to log in before you can comment on or make changes to this bug.
Description frmanwi 2011-12-11 11:10:40 UTC
Created attachment 1281 [details]
Test code + Edited Mac layer of 802.11 for stats purpose

I've setted an application to send 1 packet every 1/alpha secs. After a simulation of 10 seconds some nodes have not generated SimTime*alpha.

It seems to be related to the start-time of the applications. Firstly, I've setted applications to start at 0.001, 0.002, ... 0.00n secs and this problem has shown up: 
App-1 generated X packets;
...
App-i generated (X-Y) packets;
...
App-n generated X packets;


Subsequently, I tried to set apps to start at 0.001, 0.003, 0.005, 0.007, 0.00a (where a is a prime number) and it seems to work correctly.
App-1 generated X packets;
...
App-i generated X packets;
...
App-n generated X packets;

Let me know if you need more informations.
I attach application, helper and test code.
In relation to test code, take care of the station with mac :05.
Comment 1 Tommaso Pecorella 2011-12-11 12:16:16 UTC
I'm unsure if this bug is related to the ns-3 codebase or to the user's provided applications.

In the first case it should be narrowed down to the specific issue, trying to reproduce it without "external" applications (or with a bare minimum). The code provided is quite complex and I can't exclude the bug is there.

In the second case the bug is not applicable, as Bugzilla only covers ns-3 and not user's code bugs.

Tommaso
Comment 2 frmanwi 2011-12-12 05:36:10 UTC
Tell me which application should I use.

(In reply to comment #1)
> I'm unsure if this bug is related to the ns-3 codebase or to the user's
> provided applications.
> 
> In the first case it should be narrowed down to the specific issue, trying to
> reproduce it without "external" applications (or with a bare minimum). The code
> provided is quite complex and I can't exclude the bug is there.
> 
> In the second case the bug is not applicable, as Bugzilla only covers ns-3 and
> not user's code bugs.
> 
> Tommaso
Comment 3 Tommaso Pecorella 2011-12-18 13:02:12 UTC
(In reply to comment #2)
> Tell me which application should I use.

Hi,

I checked a bit the code you've submitted. I'm still unsure if it's a bug in ns-3.

In order to "ease" the debugging, please do the following:
1) use the very latest ns-3-dev. If you need absolutely to change anything in the ns-3 modules (in your case, the WiFi module), please do minimal and non-intrusive additions. cout or NS_LOG is fine, but changing anything else might change the module in unexpected ways.
1a) in case you totally need to add functionalities, please document them and try (again) to just do the bare minimum.

2) try to use already provided modules, or if you want to provide an user-defined application, strip out anything that is unnecessary. In your case thee are three different applications but just one is used.

3) try to reproduce the bug with the smaller number of nodes possible. Ideally two, but in some cases you'll need more.

The problem in this case are:
1) you patched an "old" WiFi version, and in the new one some stuff is different. As a result, it's double work for us, as we need to a) see what was changed in previous patches and what you've added and b) proceed with the debugging.
2) you have new applications, and those are potentially bugged.

Hence, he bug might be in one of the following:
1) the WiFi module as provided by ns-3, and you found an hidden bug,
2) in your additions to the WiFi module, and then it's not ns-3 fault,
3) in the application, see point 2,
4) in the ns-3 scheduler (highly unlikely), and then it's raining hell.
5) somewhere in between all of those.

I'd say to start updating the WiFi module to the very latest one and removing all your additions, assuming they're not strictly necessary, then to repeat the tests.

You pointed out that the application is not generating the right packet number. However, if this is true, then the bug should be there also with an unmodified WiFi module. Mind that WiFi is a damn little bastard: after a number of retries it will drop the packets from the sending queue.
Hence, count the packets at the application, as is how many new packet are generated and passed to the lower layers. that's more correct.

Last note: as you know (because I've seen you have a comment in your code about it), ARP is another little bastard. Two bastards at the same time can mess things up in very funny ways. If apps are started too fast, they'll have ARP collisions. As a consequence, some of them (the lucky ones) will have an ARP table, and some will not. The unlucky ones will drop packets... sorry.
A wiser way to start applications is to have a random start time over a wider time period, so to minimize transients.

Moreover, your application is an on-off with **constant** on and off periods. This means that you'll have a ramping-up channel occupation, potentially generating a lot of collisions. I don't know if this is exactly what you meant to simulate, but it's a quite harsh condition, I'd not be surprised at all if you'd have packet losses. Using "prime numbers" as start time effectively decreased a lot the channel occupancy while ARP is on the network, so ... lemme do some math:
1st case: all apps active at 0.009, 2nd case: all apps active at 0.023.
I bet if you use 0.001, 0.00375, 0.0065, 0.00925, etc. (0.00275 period) all will go fine.

If it does, then check the following:
1) are in the "bugged" case ARP tables fine?
2) where in the stacks the missing packets are dropped? ('cuz I bet the ARE generated).
3) re-evaluate this bug and see if it's a real one or just something you underestimated in the standard.

I'll leave this bug open for now, but please update it with your findings, we don't like to have bugs half-opened.

Best regards,

Tommaso
Comment 4 Tommaso Pecorella 2011-12-18 13:20:45 UTC
Forgot to mention in the previous comment...

This "bug" might be relevant in your case. It's not a real bug tho, it's just an obscure-but-correct behavior.
https://www.nsnam.org/bugzilla/﷒0﷓

Basically the problem is the lack of a back pressure method in UDP (or TCP even) queues to the lower layers. A packet can be sent even if the ARP is running, and if more than 3 packets are sent, then... boom.

A bug? Nope, it's documented so it's a feature.

The solution? Don't rush, use a probe packet before sending real data and give ARP some time to finish its work.

If this is your case, the solution can be: When the application starts, send just ONE packet, then trigger an OFF period. This will give one second to ARP to end its job (should be enough, but you might want to extend this start probe period). Then go with the normal behavior.

T.

PS: another relevant "bug" is: https://www.nsnam.org/bugzilla/﷒1﷓
I think the general consensus is that a "Godlike ARP" is not a priority in ns-3.
Comment 5 frmanwi 2011-12-19 05:46:44 UTC
Hi Tommaso, 

thanks for the reply. Some questions you asked me,  they have been just considered by myself.

Problems are neither related to ARP not to Mac (WifiMac or MacLow) layers because I was counting packets while adding them to MacQueue. 
As you have seen I noted ARP exigences, then I moved packet counter from mac or phy layer to the higher wifimac layer.
It seems to be a problem due to the packet scheduling or to the application layer. Even I think it's can be related to: 

> 3) in the application, see point 2,
> 4) in the ns-3 scheduler (highly unlikely), and then it's raining hell.
> 5) somewhere in between all of those.


You are right while saying: 
> I'll leave this bug open for now, but please update it with your findings, 
> we don't like to have bugs half-opened.

I do not want you to lose your time (as for my time) 
I used ns3 and I think it is a very powerful simulation environment, so it will be usefull if I can work (trying to blow out a bug) to improve it. 

Today I believe I can't reset wifi code to have a deeper test than at my first look, but in next days I will test it and I will update you.

Once again thanks for replying, I'll update you as soon as possibile.
Matteo
Comment 6 Tommaso Pecorella 2011-12-19 09:52:54 UTC
Hi Matteo,

(In reply to comment #5)
> Hi Tommaso, 
> 
> thanks for the reply. Some questions you asked me,  they have been just
> considered by myself.
> 
> Problems are neither related to ARP not to Mac (WifiMac or MacLow) layers
> because I was counting packets while adding them to MacQueue. 
> As you have seen I noted ARP exigences, then I moved packet counter from mac or
> phy layer to the higher wifimac layer.
> It seems to be a problem due to the packet scheduling or to the application
> layer. Even I think it's can be related to: 

Move it even higher. WiFimac still needs a MAC address, and that's the ARP output. You have to do the counters at IP layer or higher. You can also check if it's the ARP in a very simple way... put a cout where ARP puts the packets in its private queue.

> > 3) in the application, see point 2,
> > 4) in the ns-3 scheduler (highly unlikely), and then it's raining hell.
> > 5) somewhere in between all of those.
> 
> You are right while saying: 
> > I'll leave this bug open for now, but please update it with your findings, 
> > we don't like to have bugs half-opened.
> 
> I do not want you to lose your time (as for my time) 
> I used ns3 and I think it is a very powerful simulation environment, so it will
> be usefull if I can work (trying to blow out a bug) to improve it. 
> 
> Today I believe I can't reset wifi code to have a deeper test than at my first
> look, but in next days I will test it and I will update you.
> 
> Once again thanks for replying, I'll update you as soon as possibile.
> Matteo

Don't worry about using your time or mine, it's all knowledge gained. On the other hand going straight to the solution is always better :)

Ns-3 is a powerful simulator, and like everything "big enough" it can exhibits strange interactions between the various components. Some are bugs, some are just real-world thing that, in simpler simulators, you'd never find. It's the price paid to be closer to a real implementation. Price that most of us are well happy to pay :)

Cheers,

Tommaso
Comment 7 frmanwi 2011-12-20 16:35:16 UTC
Today I've made some tests for a deeper investigation.

I think problem is related to ARP since: 
$> Application ID:1 stopped. Sent 4000 packets

while at Mac Layer (wifiMac) I got: 
$> Mac: 00:00:00:00:00:01 - Total Queue Packets: 3806


So.. I have to excuse me for having lose your time, it seems to be a 'feature' so it should be ok.
Thanks for you extremely precise reply. It have taken me on a easy way to the solution.
Comment 8 Tommaso Pecorella 2011-12-20 18:23:27 UTC
(In reply to comment #7)
> Today I've made some tests for a deeper investigation.
> 
> I think problem is related to ARP since: 
> $> Application ID:1 stopped. Sent 4000 packets
> 
> while at Mac Layer (wifiMac) I got: 
> $> Mac: 00:00:00:00:00:01 - Total Queue Packets: 3806

Wonderful !

I'm happy you found the bug. Btw, if with that kind of congested channel (loosing 194 packets while waiting for an ARP **is** heavy congestion) you're able to transmit stuff, then you probably have some good stuff for a paper :)

> So.. I have to excuse me for having lose your time, it seems to be a 'feature'
> so it should be ok.

No need to excuse you, you did right raising a bug. It could had been one. better sure than sorry.

> Thanks for you extremely precise reply. It have taken me on a easy way to the
> solution.

I'm happy about it. We should document that thing tho, since you're not the first one falling there (and you'll not be the last one).

Cheers,

T.