Bug 2769 - TCP RTO event sets ssthresh to 2*MSS always
TCP RTO event sets ssthresh to 2*MSS always
Status: RESOLVED FIXED
Product: ns-3
Classification: Unclassified
Component: tcp
unspecified
All All
: P3 normal
Assigned To: natale.patriciello
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2017-08-05 15:07 UTC by Tom Henderson
Modified: 2017-09-17 18:01 UTC (History)
3 users (show)

See Also:


Attachments
ssthresh log (75 bytes, text/plain)
2017-08-05 15:08 UTC, Tom Henderson
Details
bytes in flight log (646.14 KB, text/plain)
2017-08-05 15:08 UTC, Tom Henderson
Details
Proposed patch (856 bytes, patch)
2017-08-23 04:24 UTC, Michele Polese
Details | Diff
Test draft (5.33 KB, text/plain)
2017-08-23 04:25 UTC, Michele Polese
Details
Test draft (5.27 KB, text/plain)
2017-08-23 07:57 UTC, Michele Polese
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Henderson 2017-08-05 15:07:43 UTC
Michele Polese reports the following possible problem:

In TcpSocketBase::ReTxTimeout, the following operations are done after an RTO:
1 - all the packets in the SentList of the TxBuffer are marked as lost, and moved in the AppList (i.e., as if they were new packets);
2 - the recovery point i set to HighData (RFC 6675, Sec. 5.1);
3 - the RTO is doubled (RFC 6298);
4 - the ssthresh is set to (with NewReno) to max (2*MSS, bytesInFlight/2);
5 - cwnd is updated;
6 - the state is set to CA_LOSS;
7 - DoRetransmit is called.

Step 4 appears to be impacted by step 1, and the SentList perhaps should not be cleared in step 1.

When in step 4 the TcpTxBuffer::BytesInFlight() is called (via TcpSocketBase::BytesInFlight()), it scans the whole m_sentList. However, this was reset at step 1, and now it contains a single packet, which is marked as lost, thus the total amount of bytes in flight that the method returns is 0. Consequently, the ssthresh is set (with NewReno) to max(bytesInFlight/2=0, 2*MSS) = 2*MSS.

Logs for the BytesInFlight and the ssthresh are attached for an example (MSS=14k bytes), and at 0.488996 an RTO is triggered.  If my interpretation is correct, then the ssthresh after an RTO is always set to 2*MSS.
Comment 1 Tom Henderson 2017-08-05 15:08:07 UTC
Created attachment 2894 [details]
ssthresh log
Comment 2 Tom Henderson 2017-08-05 15:08:26 UTC
Created attachment 2895 [details]
bytes in flight log
Comment 3 Tom Henderson 2017-08-05 15:09:23 UTC
Here is relevant specification text from RFC 5681:

   When a TCP sender detects segment loss using the retransmission timer
   and the given segment has not yet been resent by way of the
   retransmission timer, the value of ssthresh MUST be set to no more
   than the value given in equation (4):

      ssthresh = max (FlightSize / 2, 2*SMSS)            (4)

   where, as discussed above, FlightSize is the amount of outstanding
   data in the network.

   On the other hand, when a TCP sender detects segment loss using the
   retransmission timer and the given segment has already been
   retransmitted by way of the retransmission timer at least once, the
   value of ssthresh is held constant.
Comment 4 Michele Polese 2017-08-09 12:32:33 UTC
Thanks Tom for filing this.
A quick fix for the first part is swapping step 4 and step 1, or at least saving the amount of in flight bytes before doing step 1.
Comment 5 natale.patriciello 2017-08-22 15:55:09 UTC
Thank you Michele,

The report is perfect and lead to the very simple approach proposed by Tom:


diff --git i/src/internet/model/tcp-socket-base.cc w/src/internet/model/tcp-socket-base.cc
index 1a5faa1bb..c61ebfaba 100644
--- i/src/internet/model/tcp-socket-base.cc
+++ w/src/internet/model/tcp-socket-base.cc
@@ -3152,6 +3152,8 @@ TcpSocketBase::ReTxTimeout ()
       return;
     }
 
+  uint32_t inFlight = BytesInFlight ();
+
   // From RFC 6675, Section 5.1
   // [RFC2018] suggests that a TCP sender SHOULD expunge the SACK
   // information gathered from a receiver upon a retransmission timeout
@@ -3204,7 +3206,7 @@ TcpSocketBase::ReTxTimeout ()
   // retransmission timer, decrease ssThresh
   if (m_tcb->m_congState != TcpSocketState::CA_LOSS || !m_txBuffer->IsHeadRetransmitted ())
     {
-      m_tcb->m_ssThresh = m_congestionControl->GetSsThresh (m_tcb, BytesInFlight ());
+      m_tcb->m_ssThresh = m_congestionControl->GetSsThresh (m_tcb, inFlight);
     }
 
   // Cwnd set to 1 MSS




What I would like to see is also the proper assert added in the TCP timeout test. I would like to leave it as an exercise for some of you; do you have time to do the patch?

Thanks!
Comment 6 Michele Polese 2017-08-23 04:24:05 UTC
Created attachment 2906 [details]
Proposed patch
Comment 7 Michele Polese 2017-08-23 04:25:33 UTC
Created attachment 2907 [details]
Test draft

Test for the ssthresh after an RTO.
Comment 8 Michele Polese 2017-08-23 04:28:42 UTC
Natale, Tom,
I created the patch and also a draft of the test.
However, I have some issues in re-creating in the test the condition for which bytesInFlight/2 > 2*SMSS. Do you have any suggestion?
Thanks!
Comment 9 natale.patriciello 2017-08-23 06:56:38 UTC
I think it is not necessary to make the connection fail; I think we need only this (pseudocode):


- send segment [1, 100]
- drop segment 50
- send segment 50 (triggered by fast recovery)
- drop segment 50
- .... (time passes)
- RTO fires for segment 50, check ssthresh half of the in flight (25 segments in this particular case)
- send segment 50 (triggered by RTO)
- drop segment 50
- .... (time passes)
- RTO fires for segment 50, check ssthresh == as before (no halving)
Comment 10 Michele Polese 2017-08-23 07:57:09 UTC
Created attachment 2908 [details]
Test draft
Comment 11 natale.patriciello 2017-09-10 06:06:02 UTC
Hello Michele,

the test is running fine even without the patch applied... I would like to include it before the next release (next week). Do you have time to work on it?
Comment 12 Michele Polese 2017-09-11 02:44:29 UTC
I can try. The thing is that it is difficult to replicate in the test environment the condition for which the amount of bytes in flight is larger that the mss. Do you have any suggestion?
Comment 13 Tom Henderson 2017-09-17 18:01:00 UTC
Pushed in commit 13070; test case in 13069.