Bug 2284 - The Mac header is added twice if enqueuing fails and the packet is retransmitted by the upper layers
The Mac header is added twice if enqueuing fails and the packet is retransmit...
Status: RESOLVED FIXED
Product: ns-3
Classification: Unclassified
Component: point-to-point
ns-3-dev
PC Linux
: P5 normal
Assigned To: ns-bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-27 10:48 UTC by Stefano Avallone
Modified: 2016-08-19 09:36 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefano Avallone 2016-01-27 10:48:47 UTC
While testing the work on introducing the traffic control module, I noticed that the tcp-large-transfer example crashes. The reason is that some packets have two PPP headers and hence Ipv4 deserialization fails. This happens when the traffic control sends a packet to the device, the device cannot enqueue the packet, the traffic control requeues the packet and sends it again to the device.

Let's see the code in PointToPointNetDevice::Send

  if (IsLinkUp () == false)
    {
      m_macTxDropTrace (packet);
      return false;
    }

  //
  // Stick a point to point protocol header on the packet in preparation for
  // shoving it out the door.
  //
  AddHeader (packet, protocolNumber);

  m_macTxTrace (packet);

  //
  // We should enqueue and dequeue the packet to hit the tracing hooks.
  //
  if (m_queue->Enqueue (packet))
    {
      //
      // If the channel is ready for transition we send the packet right now
      // 
      if (m_txMachineState == READY)
        {
          packet = m_queue->Dequeue ();
          m_snifferTrace (packet);
          m_promiscSnifferTrace (packet);
          return TransmitStart (packet);
        }
      return true;
    }

  // Enqueue may fail (overflow)
  m_macTxDropTrace (packet);
  return false;

The problem is that the PPP header is added before checking that the packet can be enqueued. So far, this bug did not appear because no retransmission is attempted if the Send returns false. The traffic control layer (the queue disc, more precisely), instead, requeues a packet whose transmission failed and later retries to transmit the packet, which thus gets a second PPP header added.

Note that adding the PPP header after enqueuing a packet (inside the outer if statement) does not work, because we modify the size of a packet after it is enqueued, thus breaking the statistics kept by the queue.

I see the following alternative solutions:

1) Before attempting to enqueue a packet, a check is performed to verify that the packet can be actually enqueued. This requires the addition of a uint32_t Queue::Available (void) method, which returns the amount of additional packets or bytes (depending on the mode) that can be stored in the queue.

2) the Queue object held by a PointToPointNetDevice stores objects of type PPPQueueItem, a subclass of QueueItem that additionally stores the PPP header. Thus, the PPP header is stored along with the packet inside the device queue and it is actually added to the packet after the packet is dequeued from the device queue.

3) the PPP header is removed from the packet just before returning false in case the enqueue fails.

Opinions?

Thanks,
Stefano
Comment 1 Tommaso Pecorella 2016-01-27 10:59:30 UTC
Copy the packet before adding the header.
Try to enqueue the copied packet.

T.
Comment 2 Stefano Avallone 2016-01-27 12:40:43 UTC
Yeah, copying the packet works, thanks for the advice. I am only concerned about the overhead introduced by copying every single transmitted packet. Maybe methods 1) and 2) I proposed are more efficient from a computational point of view.

Stefano

ps I found (by chance) the comments you left on github regarding my traffic control work. I am currently addressing them. Huge thanks!
Comment 3 Tommaso Pecorella 2016-01-27 18:34:24 UTC
Hi Stefano,

probably 1 or 2 are slightly more efficient, but only when there are dropped packets. Packet drops are something that should be avoided (nobody likes to be in a congestion state), so the performance hit is secondary.
The packet copy technique is used in many places, and it's quite clear to understand.
Hence... I'd avoid to optimize this point (the rule is: don't optimize what is not a bottleneck - first find the bottlenecks, then optimize them).

Cheers,

T.

PS: I'm glad you found my comments, I thought you wanted them there... :)

(In reply to Stefano Avallone from comment #2)
> Yeah, copying the packet works, thanks for the advice. I am only concerned
> about the overhead introduced by copying every single transmitted packet.
> Maybe methods 1) and 2) I proposed are more efficient from a computational
> point of view.
> 
> Stefano
> 
> ps I found (by chance) the comments you left on github regarding my traffic
> control work. I am currently addressing them. Huge thanks!
Comment 4 Tommaso Pecorella 2016-01-27 18:35:24 UTC
PS: Do you want that I commit this fix ?

T.

(In reply to Tommaso Pecorella from comment #3)
> Hi Stefano,
> 
> probably 1 or 2 are slightly more efficient, but only when there are dropped
> packets. Packet drops are something that should be avoided (nobody likes to
> be in a congestion state), so the performance hit is secondary.
> The packet copy technique is used in many places, and it's quite clear to
> understand.
> Hence... I'd avoid to optimize this point (the rule is: don't optimize what
> is not a bottleneck - first find the bottlenecks, then optimize them).
> 
> Cheers,
> 
> T.
> 
> PS: I'm glad you found my comments, I thought you wanted them there... :)
> 
> (In reply to Stefano Avallone from comment #2)
> > Yeah, copying the packet works, thanks for the advice. I am only concerned
> > about the overhead introduced by copying every single transmitted packet.
> > Maybe methods 1) and 2) I proposed are more efficient from a computational
> > point of view.
> > 
> > Stefano
> > 
> > ps I found (by chance) the comments you left on github regarding my traffic
> > control work. I am currently addressing them. Huge thanks!
Comment 5 Stefano Avallone 2016-01-28 04:20:04 UTC
Maybe I am missing something. I thought you meant something like this:

  //
  // Stick a point to point protocol header on the packet in preparation for
  // shoving it out the door.
  //
  Ptr<Packet> copy = packet->Copy ();
  AddHeader (copy, protocolNumber);

  m_macTxTrace (copy);

  //
  // We should enqueue and dequeue the packet to hit the tracing hooks.
  //
  if (m_queue->Enqueue (copy))
    {
      //
      // If the channel is ready for transition we send the packet right now
      // 
      if (m_txMachineState == READY)
        {
          packet = m_queue->Dequeue ()->GetPacket ();
          m_snifferTrace (packet);
          m_promiscSnifferTrace (packet);
          return TransmitStart (packet);
        }
      return true;
    }

  // Enqueue may fail (overflow)
  m_macTxDropTrace (copy);
  return false;

where the copy is done for all the packets. Please note that the header cannot be added after a successful enqueue because that would invalidate the statistics held by m_queue in terms of queued bytes (I tried and actually an assert in droptail fails).

Anyway, I agree with your observations on performance efficiency. In the light of them, we might consider option 3), i.e., remove the PPP header only for dropped packets:

  // Enqueue may fail (overflow)
  m_macTxDropTrace (packet);
  PppHeader ppp;
  packet->RemoveHeader (ppp);
  return false;

Though, maybe removing the header is more costly than copying a packet?

Stefano


(In reply to Tommaso Pecorella from comment #3)
> Hi Stefano,
> 
> probably 1 or 2 are slightly more efficient, but only when there are dropped
> packets. Packet drops are something that should be avoided (nobody likes to
> be in a congestion state), so the performance hit is secondary.
> The packet copy technique is used in many places, and it's quite clear to
> understand.
> Hence... I'd avoid to optimize this point (the rule is: don't optimize what
> is not a bottleneck - first find the bottlenecks, then optimize them).
> 
> Cheers,
> 
> T.
> 
> PS: I'm glad you found my comments, I thought you wanted them there... :)
Comment 6 Tom Henderson 2016-01-28 10:15:13 UTC
Here is the signature of this NetDevice() method.

  /**
   * \param packet packet sent from above down to Network Device
   * \param dest mac address of the destination (already resolved)
   * \param protocolNumber identifies the type of payload contained in
   *        this packet. Used to call the right L3Protocol when the packet
   *        is received.
   *
   *  Called from higher layer to send packet into Network Device
   *  to the specified destination Address
   *
   * \return whether the Send operation succeeded
   */
  virtual bool Send (Ptr<Packet> packet, const Address& dest, uint16_t protocolNumber) = 0;


It isn't clear from the doxygen whether packet is modified by Send() but since a non-const pointer is passed, one has to assume that it is modified.

It seems to me the right thing to do here is to leave PointToPointNetDevice alone and have the tc layer copy the packet before giving it to the device, so that if Send() returns false, the copy is still available.
Comment 7 Tom Henderson 2016-01-28 10:17:05 UTC
By the way, here is some background to read on the cost of Packet operations:

https://www.nsnam.org/docs/release/3.24/models/html/packets.html#copy-on-write-semantics
Comment 8 Tommaso Pecorella 2016-01-28 10:25:07 UTC
I think the bug isn't related to TC. Perhaps we have never "seen" it, but the problem can be there.

(In reply to Tom Henderson from comment #6)
> Here is the signature of this NetDevice() method.
> 
>   /**
>    * \param packet packet sent from above down to Network Device
>    * \param dest mac address of the destination (already resolved)
>    * \param protocolNumber identifies the type of payload contained in
>    *        this packet. Used to call the right L3Protocol when the packet
>    *        is received.
>    *
>    *  Called from higher layer to send packet into Network Device
>    *  to the specified destination Address
>    *
>    * \return whether the Send operation succeeded
>    */
>   virtual bool Send (Ptr<Packet> packet, const Address& dest, uint16_t
> protocolNumber) = 0;
> 
> 
> It isn't clear from the doxygen whether packet is modified by Send() but
> since a non-const pointer is passed, one has to assume that it is modified.
> 
> It seems to me the right thing to do here is to leave PointToPointNetDevice
> alone and have the tc layer copy the packet before giving it to the device,
> so that if Send() returns false, the copy is still available.
Comment 9 Tom Henderson 2016-01-28 10:36:09 UTC
(In reply to Tommaso Pecorella from comment #8)
> I think the bug isn't related to TC. Perhaps we have never "seen" it, but
> the problem can be there.

can you state again what the bug is?  Modifying the Packet despite returning false?
Comment 10 Stefano Avallone 2016-01-28 11:03:08 UTC
(In reply to Tom Henderson from comment #9)
> can you state again what the bug is?  Modifying the Packet despite returning
> false?

if the xxxNetDevice::Send method adds the MAC header to a packet and *then* checks whether it can enqueue the packet, it turns out, in case it can not, that it returns false and the packet has the MAC header added.

If the object which passed the packet to the xxxNetDevice::Send method retries to send the packet until xxxNetDevice::Send returns true (which is the case of TC), the second time the packet is passed to the device, the packet already has a MAC header added and the Send method adds another MAC header to the packet, thus creating a malformed packet.

So, this problem has been disclosed by TC, since TC retries to send a packet, and must be clearly fixed when we introduce TC. Please note that this is not only an issue with the point-to-point device. All the devices that add the MAC header despite returning false trigger this problem (e.g., csma).

I think that having the TC layer copy every packet sent to the device is costly (the AddHeader performed by Send is a dirty operation). In my opinion, it would be better that the device removes the header (non-dirty operation) only from the packets that cannot enqueue.

If we agree on a solution, I can check which devices need to be "fixed" and include the patch in the upcoming TC review request.

Thanks,
Stefano
Comment 11 Stefano Avallone 2016-01-28 11:10:12 UTC
(In reply to Stefano Avallone from comment #10)
> I think that having the TC layer copy every packet sent to the device is
> costly (the AddHeader performed by Send is a dirty operation). In my
> opinion, it would be better that the device removes the header (non-dirty
> operation) only from the packets that cannot enqueue.

Well, maybe costs should be evaluated more carefully...
Comment 12 Tom Henderson 2016-01-28 11:49:58 UTC
> 
> I think that having the TC layer copy every packet sent to the device is
> costly (the AddHeader performed by Send is a dirty operation). In my
> opinion, it would be better that the device removes the header (non-dirty
> operation) only from the packets that cannot enqueue.
> 
> If we agree on a solution, I can check which devices need to be "fixed" and
> include the patch in the upcoming TC review request.
> 

I'm not convinced yet about what to do.

We could change NetDevice::Send () to take a Ptr<const Packet> and force the NetDevice to copy the Packet before modifying; this approach is taken elsewhere in the stack.  This may appear costly (extra copies) but it seems to me that whenever AddHeader() is called, usually somewhere in the code still has a pointer to the original packet, and there may be no real difference.  Packets are designed to have inexpensive copies.  As Tommaso suggested, it could be directly measured with a benchmark without speculating.

We could also change the API documentation to state that if Send() returns false, packet must not be modified.  This will just tend to move the copy from the tc layer to the NetDevice layer.

In general, most devices need to be modified to provide flow control or feedback on whether Send() succeeded.

I think that you can proceed for the moment by copying the packet at tc layer in case Send() fails, and later look at an optimization to remove this copy, perhaps in the context of more widespread NetDevice changes to better support tc.
Comment 13 Stefano Avallone 2016-01-28 13:22:29 UTC
(In reply to Tom Henderson from comment #12)
> I think that you can proceed for the moment by copying the packet at tc
> layer in case Send() fails, and later look at an optimization to remove this
> copy, perhaps in the context of more widespread NetDevice changes to better
> support tc.

This looks reasonable to me. I just added to TrafficControlLayer::Transmit the following:

      // send a copy of the packet because the device might add the
      // MAC header even if the transmission is unsuccessful (see BUG 2284)
      Ptr<Packet> copy = item->GetPacket ()->Copy ();
      ret = m_device->Send (copy, item->GetAddress (), item->GetProtocol ());

Thanks,
Stefano
Comment 14 Tom Henderson 2016-02-04 14:19:33 UTC
Marking as needinfo; need to wait to see if traffic control layer requires any changes related to this.
Comment 15 natale.patriciello 2016-02-09 04:57:59 UTC
Hi, sorry for being late on that.

I agree with Stefano and I think the option 3 is the most reasonable. It was the thing I done in early stage of my work (https://github.com/kronat/ns-3-dev-git/blob/netdevice-backpressure/src/point-to-point/model/point-to-point-net-device.cc).

Send should take a non-const pointer to Packet, because it is probably going to modify it; but if it fails (it does not happen often) it should "roll-back" the packet to the original one. It can be defined in the interface, which should state clearly this strategy.

Then, NetDevices have two possibilities: making themselves a copy, and returning the original in case of failure, or rolling back the header addition. Probably these two can be employed on different netdevices, e.g. P2P could revert the header addition, while some others, like LTE, which performs complex operations on packets (encapsulation) could use copy.
Comment 16 Stefano Avallone 2016-08-19 09:36:33 UTC
Fixed with changeset 12124:b38f758e87a8