Bugzilla – Bug 737
Backoff procedure is not invoked when transmission is deferred
Last modified: 2016-04-07 15:09:44 UTC
In accordance with 9.2.8 of 802.11-2007 (Ack procedure) "If a PHY-RXSTART.indication does not occur during the ACKTimeout interval, the STA concludes that the transmission of the MPDU has failed, and this STA shall invoke its backoff procedure upon expiration of the ACKTimeout interval." The same situation is with CTS timeout. Current implementation starts a backoff procedure inside DCA-TXOP, but only DcfManager knows about CTS and ACK timeouts. So, the method like "StartBackoffProcedure" shall be added to DcaTxop and EdcaTxop, but in this case backoff procedure will be initiated from both DcaTxop and DcfManager. What about making DcfManager the only place to start backoff procedure for all queues?
I am sorry but I don't see what the problem is: I believe that we do the right thing with regard to the paragraph you quote from the standard. Maybe you could try to give a few more details about the failing testcase you have ?
I have found more clear explanation of what is wrong in DcfManager: Chapter 9.1.1 says: "After deferral, or prior to attempting to transmit again immediately after a successful transmission, the STA shall select a random backoff interval and shall decrement the backoff interval counter while the medium is idle." So DcfManager::IsBusy () must check, that AIFSn of a queue that requested access has passed after GetAccessGrantStart () (rather than checking if we are not receiving or transmitting, or NAV busy). I have observed it while looking at pcap traces in mesh: a station does not calculate a backoff when it retransmits a frame, because frame was retransmitted immediately after receiving, when the medium was idle (medium was idle inside a SIFS before ACK!). Am I right?
Also it seems to me that first version of patch in bug 555 solves this problem
Addition to previous commit: but in first patch in bug 555 IsBusy () is not needed
Created attachment 689 [details] Proposed fix
(In reply to comment #2) > a station does not > calculate a backoff when it retransmits a frame, because frame was > retransmitted immediately after receiving, when the medium was idle (medium was > idle inside a SIFS before ACK!). I am sorry but I really do not understand this description of your testcase. Please, can you try to provide a more detailed description ?
I have made a short illustration about how DcfManager operates when a frame is retransmitted immediately after it was received. running mesh script with 1x3 grid and simple debugging print I have observed, that backoff procedure is not invoked even when station defers its transmission.
Created attachment 717 [details] illustration of 1x3 grid scenario Illustration
Also the main problem of this bug is in broken dcf-manager-test, where each test fails. Mathieu, could, you please, review my previous comment?
Hi, Mathieu, I'd like to draw your attention to this bug, since it appears to be critical in multihop mesh/manet networks. The reason is that none of our models really account for processing delays and all re-transmissions (say forwarding RREQ in AODV) occur simultaneously. Large number of exactly simultaneous transmissions leads to significantly overestimated collision probability and even to wrong protocol operation. Recent example of this behavior was reported to me recently by Kuba Wierusz. Fixing bug 737 gives us simple workaround for this problem without the need of explicit accounting for processing delay. Indeed, now when wifi device is asked to retransmit a packet without any delay it will deffer this for DIFS. After proposed bugfix every deferred TX will start backoff -- exactly what is needed to avoid artificial collisions. What do you think? Regards, Pavel
(In reply to comment #10) > I'd like to draw your attention to this bug, since it appears to be critical > in multihop mesh/manet networks. The reason is that none of our models really > account for processing delays and all re-transmissions (say forwarding RREQ in > AODV) occur simultaneously. Large number of exactly simultaneous transmissions > leads to significantly overestimated collision probability and even to wrong > protocol operation. Recent example of this behavior was reported to me recently > by Kuba Wierusz. Thanks a lot for the detailed diagram by kiril, I do understand this problem better now. The key issue I was confused about was the term "retransmission" in the context of the MAC. For me, it meant retransmission attempt after a failed transmission. For kirill and you, it means, a MAC-level forwarding attempt. > Fixing bug 737 gives us simple workaround for this problem without the need > of explicit accounting for processing delay. Indeed, now when wifi device is > asked to retransmit a packet without any delay it will deffer this for DIFS. > After proposed bugfix every deferred TX will start backoff -- exactly what is > needed to avoid artificial collisions. My initial reaction to this proposed solution is that it is wrong: we should not unconditionally start a backoff after a packet reception: it goes against both the spirit and the letter of the 802.11 spec. It seems to me that the problem is not that we need to model processing delays: we need to model non-deterministic _varying_ processing delays which change from one station to another, and, potentially, from one packet to another within the same station, right ? If so, I would support adding a delay in MacLow when we receive a packet before forwarding it to the upper layers and making that delay be picked from a RandomVariable with a default value of being a gaussian distribution centered around 10us with a non-zero value for the variance. I will ask a collegue what a decent value would be for the mean/variance to model some PC-class hardware.
> My initial reaction to this proposed solution is that it is wrong: we should > not unconditionally start a backoff after a packet reception: We do not propose to unconditionally start backoff after a packet reception. We propose to start backoff for _all_ deferred transmissions including the ones deferred because DIFS is not passed yet (as in the case of too-fast-forwarding). > it goes against both the spirit and the letter of the 802.11 spec. I don't think so. Take a look at 9.1.1 of 802.11-2007: "After deferral, or prior to attempting to transmit again immediately after a successful transmission, the STA shall select a random backoff interval and shall decrement the backoff interval counter while the medium is idle." To understand that "deferral" means "medium was busy of DIFS wasn't passed" take a look at Fig. 9.3 ibid: "Defer access interval" = "Medium busy" + "DIFS". > It seems to me that the problem is not that we need to model processing delays: > we need to model non-deterministic _varying_ processing delays which change > from one station to another, and, potentially, from one packet to another > within the same station, right ? If so, I would support adding a delay in > MacLow when we receive a packet before forwarding it to the upper layers and > making that delay be picked from a RandomVariable with a default value of being > a gaussian distribution centered around 10us with a non-zero value for the > variance. I will ask a collegue what a decent value would be for the > mean/variance to model some PC-class hardware. Sure you are right that good solution is to start account for processing delays. But I am very uncomfortable with all ad-hoc solutions in this field. Why 10 us? Why gaussian (saying nothing about negative delays)? Why "some PC-class"? Why at wifi/mac-low? What about Ethernet, wimax, and all future models? I propose to a) apply suggested DCF patch b) start public discussion of modeling processing delays in ns-3.
(In reply to comment #12) > > My initial reaction to this proposed solution is that it is wrong: we should > > not unconditionally start a backoff after a packet reception: > > We do not propose to unconditionally start backoff after a packet reception. > We propose to start backoff for _all_ deferred transmissions including the ones > deferred because DIFS is not passed yet (as in the case of > too-fast-forwarding). > > > it goes against both the spirit and the letter of the 802.11 spec. > > I don't think so. Take a look at 9.1.1 of 802.11-2007: "After deferral, or > prior to attempting to transmit again immediately after a successful > transmission, the STA shall select a random backoff interval and shall > decrement the backoff interval counter while the medium is idle." To understand > that "deferral" means "medium was busy of DIFS wasn't passed" take a look at > Fig. 9.3 ibid: "Defer access interval" = "Medium busy" + "DIFS". That is not fully correct. See section 9.2.4: A STA desiring to initiate transfer [...] shall invoke the CS mechanism [...] to determine the busy/idle state of the medium. If the medium is busy, the STA shall defer until the medium is determined to be idle without interruption for a period of time equal to DIFS [...]. After this DIFS [...] medium idle time, the STA shall then generate a random backoff period [...] before transmitting, unless the backoff timer already contains a nonzero value, in which case the selection of a random number is not needed and not performed. Note, specifically, the last part of the last sentence: "unless the backoff timer already contains a nonzero value" which is precisely what is happening here. > > It seems to me that the problem is not that we need to model processing delays: > > we need to model non-deterministic _varying_ processing delays which change > > from one station to another, and, potentially, from one packet to another > > within the same station, right ? If so, I would support adding a delay in > > MacLow when we receive a packet before forwarding it to the upper layers and > > making that delay be picked from a RandomVariable with a default value of being > > a gaussian distribution centered around 10us with a non-zero value for the > > variance. I will ask a collegue what a decent value would be for the > > mean/variance to model some PC-class hardware. > > Sure you are right that good solution is to start account for processing > delays. But I am very uncomfortable with all ad-hoc solutions in this field. > Why 10 us? Why gaussian (saying nothing about negative delays)? Why "some > PC-class"? Why at wifi/mac-low? What about Ethernet, wimax, and all future > models? Oops, I removed the relevant part from my initial comment: that delay would model the interrupt latency between the MacLow and the higher-level layers which is something on the order of 10us on PC-style hardware with a nice RTOS (and closer to something like 10ms with a standard linux OS but with a very high variance). And, of course, you need to make that delay non-negative but that is a detail. > I propose to a) apply suggested DCF patch b) start public discussion of > modeling processing delays in ns-3. I would be fine with a generic discussion about this (b) but I do not think it is needed to deal with this issue.
(In reply to comment #13) > (In reply to comment #12) > > That is not fully correct. See section 9.2.4: > > A STA desiring to initiate transfer [...] shall invoke the CS mechanism [...] > to determine the busy/idle state of the medium. If the medium is busy, the STA > shall defer until the medium is determined to be idle without interruption for > a period of time equal to DIFS [...]. After this DIFS [...] medium idle time, > the STA shall then generate a random backoff period [...] before transmitting, > unless the backoff timer already contains a nonzero value, in which case the > selection of a random number is not needed and not performed. > > Note, specifically, the last part of the last sentence: "unless the backoff > timer already contains a nonzero value" which is precisely what is happening > here. > So, in addition, we must check here, that backoff counter for a given queue is zero, which is performed. So, I can not understand, where is an error in this patch.
(In reply to comment #14) > So, in addition, we must check here, that backoff counter for a given queue is > zero, which is performed. So, I can not understand, where is an error in this > patch. Where are you doing this ? Which patch are you talking about ?
(In reply to comment #8) > Created an attachment (id=717) [details] > illustration of 1x3 grid scenario > > Illustration When is the RequestAccess method called in this scenario ?
(In reply to comment #15) > (In reply to comment #14) > > > So, in addition, we must check here, that backoff counter for a given queue is > > zero, which is performed. So, I can not understand, where is an error in this > > patch. > > Where are you doing this ? Which patch are you talking about ? The following patch: diff -r ed0b2d9301a1 src/devices/wifi/dcf-manager.cc --- a/src/devices/wifi/dcf-manager.cc Tue Dec 01 18:34:11 2009 +0300 +++ b/src/devices/wifi/dcf-manager.cc Wed Dec 02 17:55:15 2009 +0300 @@ -375,7 +375,7 @@ * by notifying the collision to the user. */ if (state->GetBackoffSlots () == 0 && - IsBusy ()) + GetBackoffStartFor (state) > Simulator::Now ()) { MY_DEBUG ("medium is busy: collision"); /* someone else has accessed the medium. When we request access, we check, that backoff counter is zero, and, if zero, check, where a given queue may start to transmit (taking into account difs, eifs, etc (GetAccessGrantStart is called)). If queue may start to transmit immediately, we do not start backoff, and start it otherwise. Note, that we do not start backoff twice.
(In reply to comment #16) > (In reply to comment #8) > > Created an attachment (id=717) [details] [details] > > illustration of 1x3 grid scenario > > > > Illustration > > When is the RequestAccess method called in this scenario ? Exactly after RX, because a frame to be forwarded goes through the upper layer immediately
(In reply to comment #18) > (In reply to comment #16) > > (In reply to comment #8) > > > Created an attachment (id=717) [details] [details] [details] > > > illustration of 1x3 grid scenario > > > > > > Illustration > > > > When is the RequestAccess method called in this scenario ? > > Exactly after RX, because a frame to be forwarded goes through the upper layer > immediately Are you _sure_ that your backoff slots are zero when RequestAccess is called in your testcase ?
(In reply to comment #19) > Are you _sure_ that your backoff slots are zero when RequestAccess is called in > your testcase ? If they are zero, what is the NAV status in IsBusy ? Does MacLow::NotifyNav call MacLow::DoNavStartNow just before the call to RequestAccess ?
(In reply to comment #20) > (In reply to comment #19) > > Are you _sure_ that your backoff slots are zero when RequestAccess is called in > > your testcase ? > > If they are zero, what is the NAV status in IsBusy ? Does MacLow::NotifyNav > call MacLow::DoNavStartNow just before the call to RequestAccess ? I have mistaken with the testcase. Suppose the same situation with forwarding broadcast frame. 1. NAV will be zero immediately after RX, medium is idle 2. Backoff slots will be zero (last transmit was long time ago). The debug output is the following: RX end OK at 81757537ns this = 0x809d5b8 (from DcfManager::NotifyRxEndOkNow) Request access at 81757537ns, this = 0x809d5b8 (from DcfManager::RequestAccess) remaining slots are:0 (from DcfManager::RequestAccess(state)) lastNavend is81757537ns, this = 0x809d5b8 (from IsBusy ()) So, broadcast is forwarded without backoff.
The same situation occurs with uniocast, because NAV is not set if hdr.GetAddr1 () != m_self
Mathieu, > Oops, I removed the relevant part from my initial comment: that delay would > model the interrupt latency between the MacLow and the higher-level layers > which is something on the order of 10us on PC-style hardware with a nice RTOS > (and closer to something like 10ms with a standard linux OS but with a very > high variance). And, of course, you need to make that delay non-negative but > that is a detail. After some thoughts I definitely agree on adding an (adjustable) interrupt latency with some meaningful default numbers to wifi low mac. Could you do this? The question of changing/not changing backoff logic as proposed by Kirill remains.
some time has passed, so I'll try to wrap up the discussion: 1) as for the backoff behavior, my understanding is that the arguments for applying the proposed patch are not convincing, so I am closing the bug. 2) I just filed the new bug 912 to keep track of the issue of modeling the processing delays. Please continue the discussion there if you are interested.
*** Bug 260998 has been marked as a duplicate of this bug. *** Seen live from the domain http://volichat.com/adult-chat-rooms Marked for reference. Resolved as fixed @bugzilla.