Bug 2769

Summary: TCP RTO event sets ssthresh to 2*MSS always
Product: ns-3 Reporter: Tom Henderson <tomh>
Component: tcpAssignee: natale.patriciello
Status: RESOLVED FIXED    
Severity: normal CC: mezzavilla.marco, michele.polese, ns-bugs
Priority: P3    
Version: unspecified   
Hardware: All   
OS: All   
Attachments: ssthresh log
bytes in flight log
Proposed patch
Test draft
Test draft

Description Tom Henderson 2017-08-05 15:07:43 UTC
Michele Polese reports the following possible problem:

In TcpSocketBase::ReTxTimeout, the following operations are done after an RTO:
1 - all the packets in the SentList of the TxBuffer are marked as lost, and moved in the AppList (i.e., as if they were new packets);
2 - the recovery point i set to HighData (RFC 6675, Sec. 5.1);
3 - the RTO is doubled (RFC 6298);
4 - the ssthresh is set to (with NewReno) to max (2*MSS, bytesInFlight/2);
5 - cwnd is updated;
6 - the state is set to CA_LOSS;
7 - DoRetransmit is called.

Step 4 appears to be impacted by step 1, and the SentList perhaps should not be cleared in step 1.

When in step 4 the TcpTxBuffer::BytesInFlight() is called (via TcpSocketBase::BytesInFlight()), it scans the whole m_sentList. However, this was reset at step 1, and now it contains a single packet, which is marked as lost, thus the total amount of bytes in flight that the method returns is 0. Consequently, the ssthresh is set (with NewReno) to max(bytesInFlight/2=0, 2*MSS) = 2*MSS.

Logs for the BytesInFlight and the ssthresh are attached for an example (MSS=14k bytes), and at 0.488996 an RTO is triggered.  If my interpretation is correct, then the ssthresh after an RTO is always set to 2*MSS.
Comment 1 Tom Henderson 2017-08-05 15:08:07 UTC
Created attachment 2894 [details]
ssthresh log
Comment 2 Tom Henderson 2017-08-05 15:08:26 UTC
Created attachment 2895 [details]
bytes in flight log
Comment 3 Tom Henderson 2017-08-05 15:09:23 UTC
Here is relevant specification text from RFC 5681:

   When a TCP sender detects segment loss using the retransmission timer
   and the given segment has not yet been resent by way of the
   retransmission timer, the value of ssthresh MUST be set to no more
   than the value given in equation (4):

      ssthresh = max (FlightSize / 2, 2*SMSS)            (4)

   where, as discussed above, FlightSize is the amount of outstanding
   data in the network.

   On the other hand, when a TCP sender detects segment loss using the
   retransmission timer and the given segment has already been
   retransmitted by way of the retransmission timer at least once, the
   value of ssthresh is held constant.
Comment 4 Michele Polese 2017-08-09 12:32:33 UTC
Thanks Tom for filing this.
A quick fix for the first part is swapping step 4 and step 1, or at least saving the amount of in flight bytes before doing step 1.
Comment 5 natale.patriciello 2017-08-22 15:55:09 UTC
Thank you Michele,

The report is perfect and lead to the very simple approach proposed by Tom:


diff --git i/src/internet/model/tcp-socket-base.cc w/src/internet/model/tcp-socket-base.cc
index 1a5faa1bb..c61ebfaba 100644
--- i/src/internet/model/tcp-socket-base.cc
+++ w/src/internet/model/tcp-socket-base.cc
@@ -3152,6 +3152,8 @@ TcpSocketBase::ReTxTimeout ()
       return;
     }
 
+  uint32_t inFlight = BytesInFlight ();
+
   // From RFC 6675, Section 5.1
   // [RFC2018] suggests that a TCP sender SHOULD expunge the SACK
   // information gathered from a receiver upon a retransmission timeout
@@ -3204,7 +3206,7 @@ TcpSocketBase::ReTxTimeout ()
   // retransmission timer, decrease ssThresh
   if (m_tcb->m_congState != TcpSocketState::CA_LOSS || !m_txBuffer->IsHeadRetransmitted ())
     {
-      m_tcb->m_ssThresh = m_congestionControl->GetSsThresh (m_tcb, BytesInFlight ());
+      m_tcb->m_ssThresh = m_congestionControl->GetSsThresh (m_tcb, inFlight);
     }
 
   // Cwnd set to 1 MSS




What I would like to see is also the proper assert added in the TCP timeout test. I would like to leave it as an exercise for some of you; do you have time to do the patch?

Thanks!
Comment 6 Michele Polese 2017-08-23 04:24:05 UTC
Created attachment 2906 [details]
Proposed patch
Comment 7 Michele Polese 2017-08-23 04:25:33 UTC
Created attachment 2907 [details]
Test draft

Test for the ssthresh after an RTO.
Comment 8 Michele Polese 2017-08-23 04:28:42 UTC
Natale, Tom,
I created the patch and also a draft of the test.
However, I have some issues in re-creating in the test the condition for which bytesInFlight/2 > 2*SMSS. Do you have any suggestion?
Thanks!
Comment 9 natale.patriciello 2017-08-23 06:56:38 UTC
I think it is not necessary to make the connection fail; I think we need only this (pseudocode):


- send segment [1, 100]
- drop segment 50
- send segment 50 (triggered by fast recovery)
- drop segment 50
- .... (time passes)
- RTO fires for segment 50, check ssthresh half of the in flight (25 segments in this particular case)
- send segment 50 (triggered by RTO)
- drop segment 50
- .... (time passes)
- RTO fires for segment 50, check ssthresh == as before (no halving)
Comment 10 Michele Polese 2017-08-23 07:57:09 UTC
Created attachment 2908 [details]
Test draft
Comment 11 natale.patriciello 2017-09-10 06:06:02 UTC
Hello Michele,

the test is running fine even without the patch applied... I would like to include it before the next release (next week). Do you have time to work on it?
Comment 12 Michele Polese 2017-09-11 02:44:29 UTC
I can try. The thing is that it is difficult to replicate in the test environment the condition for which the amount of bytes in flight is larger that the mss. Do you have any suggestion?
Comment 13 Tom Henderson 2017-09-17 18:01:00 UTC
Pushed in commit 13070; test case in 13069.