Bugzilla – Bug 2247
Incorrect TCP timeouts during Fast Recovery
Last modified: 2016-02-22 11:08:17 UTC
Created attachment 2206 [details] Example script for tcp timeouts. Run attached script to see this bug in action. (sorry i should have cleaned the example a bit...). Consider the following scenario: [client] -- [switch] -- [server] The client -> switch link has half the data rate of the switch -> server link. We size the link layer queues so that we expect drops on the switch -> client link. See attached time sequence plot for the result of running the example. When we get drops and dup acks, we go to fast retransmit, and as we recover one dropped packet every RTT NewReno, the first partial ack resets the retransmit timer, the subsequent ones do not. I don't think this behaviour is correct, and it is causing the timeout shown in the time sequence plot to occur, resulting in the retransmit of a data segment the sender has only just transmitted. The code in question is in tcp-socket-base.cc, TcpSocketBase::ReceivedAck if (tcpHeader.GetAckNumber () < m_recover) { if (m_isFirstPartialAck) { m_isFirstPartialAck = false; } else { resetRTO = false; } } . . . NewAck (tcpHeader.GetAckNumber (), resetRTO); New ack resets the RTO if the flag is passed to it. In fast recovery, when receiving partial acks, the RTO is never reset after the first new ack, and we have timeouts.
Created attachment 2207 [details] Time Sequence plot after running the example.
The policy for resetting the retransmit timeout in RFC 6582 is For the first partial ACK that arrives during fast recovery, also reset the retransmit timer. Timer management is discussed in more detail in Section 4. In previous versions of the RFC, there were some policy choices (so-called "impatient" vs "slow-but-steady" policies) on this timer reset: https://tools.ietf.org/html/rfc3782#section-4 but one thing that was done when RFC6582 went standards track was to pick from among the policy options in earlier versions of the RFC, and Impatient variant was chosen for this policy. So I think that the current implementation is conformant. Could you please review in light of this and let us know whether you think there is still a bug?
Created attachment 2208 [details] Modified example script, larger link layer queue.
Created attachment 2209 [details] Time sequence plot of modified script.
I see. Yes, you are right, the implementation should go to slow start after the first partial ack in the "impatient" policy. But I still seem to have a problem ... I've attached a modified script and it's output time sequence plot. The change i've made is to increase the link layer queue size (larger than the bandwidth delay product) and make the on-off application send a large amount of data immediately. If you take a look at the time sequence plot, it seems that TCP is behaving very strangely. On receiving a retransmit timeout while in fast recovery, after the first retransmit, it seems like TCP attempts to go into slow start again as per the RFC. But instead, the implementation dumps a really large window (the vertical line of x's in the time sequence plot) onto the network. This was the original bug I encountered before I tried to narrow it down. I thought the culprit was the RTO during fast retransmit. Now, I'm not sure why this is happening. I'm hoping this bug is reproducible with the script I've provided.
Ok, due to the large buffers, there are still duplicate acks in flight, received after the initial RTO. These are causing the TCP state machine to go right back into fast retransmit after the timeout, even though it probably should continue slow start. And somewhere along the line, I'm not sure how yet, the congestion window value is set to something large and incorrect.
Hi, have you considered cwnd inflation? For each duplicated ack, in particolar conditions, the cwnd is artificially inflated to allow a segment to be transmitted. This is why you see such a spike. By enabling debug messages, you should be able to see what's happening inside ; this could help. Tomorrow I will try the script.
Hi, yes but cwnd inflation should not change the number of packets in flight. After you halve the window on fast retransmit, suppose you have a cwnd = 4, even if you inflate the cwnd to be larger than 4 to continue the flow of data as duplicate acks come back, the number of packets in flight should still be 4. The graph I sent does not exhibit a normal cwnd inflation. In fact, it seems that the window size is set incorrectly after TCP attempts to recover from fast retransmit. I will try to debug this further and let you know what I find.
Ok, so this is the order of events that are happening which are affecting cwnd: 1. After TCP hits its first batch of losses, we get 3 dup acks, and go into fast recovery. cwnd = ssThresh = BytesInFlight/2. 2. As we continue to receive dup acks in fast recovery, cwnd is inflated by a segment size each time. Note that the actual bytes in flight do not change, and should equal to sshThresh. We receive partial acks that deflate the cwnd. 3. Because we have many losses, eventually the retransmit timeout will fire in this impatient scheme. When the timeout fires, we reset the cwnd = segmentsize. 4. The setup I have in the script has a large RTT, and a large queue at the client. Therefore, even though we've gone back to slow start, we are still receiving Dup acks from the fast recovery phase. 5. This TCP implementation goes back to the fast recovery phase when it receives these Dup Acks, event though it should probably continue with slow start. 6. On entering the fast recovery phase again, TCP attempts to set cwnd = ssThresh = BytesInFlight/2. Now, the implementation of TCPSocketBase::BytesInFlight is as follows: uint32_t TcpSocketBase::BytesInFlight () const { NS_LOG_FUNCTION (this); return m_highTxMark.Get () - m_txBuffer->HeadSequence (); } Recall that we had just exited fast recovery where the high tx mark had been artificially inflated to allow TCP to progress. Therefore, the BytesInFlight() value is in fact INCORRECT. It is reporting an inflated window, when in fact our cwnd = segmentsize, because we had just timed out. This is the reason we see the incorrect large spike in packets. To sum up, the problem is twofold: 1. It seems like we need an extra TCPSocketState, which indicates going into slow start after we get an RTO in the fast retransmit state. While in this state, TCP will ignore Dup acks, and continue slow start. It will only exit the state when the ack number received is greater than the m_recover value recorded at the beginning of the fast retransmit state. 2. The BytesInFlight() function should actually report the REAL number of bytes in flight, rather than use the tx buffer for this calculation. It will report an incorrect number when we have window inflation and therefore will cause many problems other than the one reported here. In fact, in the past I had encountered a problem where consecutive fast recovery instances were setting the cwnd incorrectly just because of this artifact.
Really nice considerations. We need to remove the inflation.. It creates many problems and corner cases; I think the steps should involve the addition of a scoreboard. I have two other bugs to take care of, but I am happy about that, because following gsoc many problems have been investigated; in fact, they were present in older versions, but pratically impossible to debug. Anyway, the scoreboard is present in every implementation, and it will be useful both for this bug and in the future for SACK.
(In reply to l.salameh from comment #9) > Ok, so this is the order of events that are happening which are affecting > cwnd: > > 1. After TCP hits its first batch of losses, we get 3 dup acks, and go into > fast recovery. cwnd = ssThresh = BytesInFlight/2. > > 2. As we continue to receive dup acks in fast recovery, cwnd is inflated by > a segment size each time. Note that the actual bytes in flight do not > change, and should equal to sshThresh. We receive partial acks that deflate > the cwnd. > > 3. Because we have many losses, eventually the retransmit timeout will fire > in this impatient scheme. When the timeout fires, we reset the cwnd = > segmentsize. > > 4. The setup I have in the script has a large RTT, and a large queue at the > client. Therefore, even though we've gone back to slow start, we are still > receiving Dup acks from the fast recovery phase. > > 5. This TCP implementation goes back to the fast recovery phase when it > receives these Dup Acks, event though it should probably continue with slow > start. > > 6. On entering the fast recovery phase again, TCP attempts to set > cwnd = ssThresh = BytesInFlight/2. Now, the implementation of > TCPSocketBase::BytesInFlight is as follows: > > uint32_t > TcpSocketBase::BytesInFlight () const > { > NS_LOG_FUNCTION (this); > return m_highTxMark.Get () - m_txBuffer->HeadSequence (); > } > > Recall that we had just exited fast recovery where the high tx mark had been > artificially inflated to allow TCP to progress. > > Therefore, the BytesInFlight() value is in fact INCORRECT. It is reporting > an inflated window, when in fact our cwnd = segmentsize, because we had just > timed out. > > This is the reason we see the incorrect large spike in packets. > > > To sum up, the problem is twofold: > > 1. It seems like we need an extra TCPSocketState, which indicates going into > slow start after we get an RTO in the fast retransmit state. While in this > state, TCP will ignore Dup acks, and continue slow start. It will only exit > the state when the ack number received is greater than the m_recover value > recorded at the beginning of the fast retransmit state. This is covered by the specification in RFC6582 When the third duplicate ACK is received, the TCP sender first checks the value of recover to see if the Cumulative Acknowledgment field covers more than recover. So we need to maintain the recover variable through the timeout event and check this before entering fast retransmit again. I suspect it is not checking because line 1393 of tcp-socket-base.cc is: else if (m_dupAckCount == m_retxThresh) > > 2. The BytesInFlight() function should actually report the REAL number of > bytes in flight, rather than use the tx buffer for this calculation. It will > report an incorrect number when we have window inflation and therefore will > cause many problems other than the one reported here. In fact, in the past I > had encountered a problem where consecutive fast recovery instances were > setting the cwnd incorrectly just because of this artifact. For this, the scoreboard/Pipe() of RFC 6675 https://tools.ietf.org/html/rfc6675 may be something to work towards.
Just wanted to point out that this bug is a major blocker for TCP, and should be treated as such, given the upcoming release of ns-3.25. I understand that TCP in ns-3 has undergone a major redesign, but it seems like it needs to go through more thorough testing before it can be ready to use.
Hi, I'm aware of the importance of this bug. I'm working to provide a list of segment transmitted, in order to provide a more reliable value of BytesInFlight () to almost "fix" this bug for the ns-3.25 release. Then, I'd like to remove inflation and deflation, to make more smooth graph of cWnd. If you want to help, we can coordinate. nat
(In reply to l.salameh from comment #9) > 2. The BytesInFlight() function should actually report the REAL number of > bytes in flight, rather than use the tx buffer for this calculation. It will > report an incorrect number when we have window inflation and therefore will > cause many problems other than the one reported here. In fact, in the past I > had encountered a problem where consecutive fast recovery instances were > setting the cwnd incorrectly just because of this artifact. Ok, for this we have a fix. I rewrote the TcpTxBuffer in order to be able to count the number of bytes not yet acknowledged, nor discarded. This equals m_highTxMark.Get () - m_txBuffer->HeadSequence () when there are not retransmissions, but it reports the correct value (as you pointed out) when retransmissions comes into play. BytesInFlight in TcpTxBuffer returns the number of bytes requested by TcpSocketBase but not yet discarded; see https://www.nsnam.org/bugzilla/0. Please, can you test this commit [1] ? And remove the assert, of course. For example, with that assert, I discovered that the old BytesInFlight does overflow in some situation (e.g ns3-tcp-test-state). You can safely use the tcp-sack branch (despite the name, it does not have tcp-sack). [1] https://github.com/kronat/ns-3-dev-git/commit/0de8dbe533cf003d6b47290d99cd2a03eaee2720
(In reply to l.salameh from comment #9) > 5. This TCP implementation goes back to the fast recovery phase when it > receives these Dup Acks, event though it should probably continue with slow > start. I noted this only now. IMHO is incorrect, since if we are in CA_LOSS, dupacks are ignored (see ReceivedAck, starting from line 1391). If we are in CA_OPEN, it means we had received a cumulative ACK, which moves forward the SND.UNA pointer. Then, if there are dupacks and we enter in CA_RECOVERY, the bytesInFlight are calculated starting from the new SND.UNA. What do you think?
Hi Natale, I checked out the tcp-sack branch and ran the patched code with my original script, and I'm still seeing the same behavior when I plot the time sequence plots, with the large vertical spike of packets. I realized that the original script I attached to this bug doesn't actually compile, so I've added a fixed version. I used matlab to plot the time sequence graphs from the output trace file, and I would be happy to supply you with those if you need them. Unfortunately, I'm a bit busy the next couple of days with phd stuff, but I'll be back towards the end of the week, and I would be glad to collaborate with you on fixing this bug. As it stands, I only ran the script with the new code, and haven't had the chance to investigate why the bug is still happening. As for your remark below, if you run the new test script I've attached to this bug, and add break points in the ReceivedAck() function, and whenever the state changes to CA_OPEN/CA_RECOVERY etc. you will see the problem in action. Under a low RTT network, the code will work correctly, and the next cumulative ACK should correctly set the state back to CA_OPEN during a CA_LOSS. But in the script I provided, we have high RTT and a delayed feedback of a sort. During the fast recovery phase, we had inflated the window and retransmitted some packets, so we should get some Dup Acks and a cumulative ACK for the correctly retransmitted packets. Unfortunately, these arrive AFTER we have timed out, which messes up the state machine. Under these conditions, even though we've timed out and we're starting to retransmit all the packets, we immediately go into fast recovery again because the states transition from : CA_RECOVERY -> CA_LOSS (on timeout) -> CA_OPEN (on receiving a Cumulative ACK from a retransmitted packet from the previous fast recovery phase) -> CA_RECOVERY (on receiving 3 dup acks from the previous fast recovery phase) Maybe this is the correct behavior, but somehow it seems wrong to me, since we immediately abort slow start and go into fast recovery again. I should probably check the rfcs. In the meantime, you can download the fixed script named "test-tcp-parameters.cc", and confirm that even with the fix, you can reproduce the large buffer and incorrect state bugs on your end. > > 5. This TCP implementation goes back to the fast recovery phase when it > > receives these Dup Acks, event though it should probably continue with slow > > start. > > I noted this only now. IMHO is incorrect, since if we are in CA_LOSS, > dupacks are ignored (see ReceivedAck, starting from line 1391). If we are in > CA_OPEN, it means we had received a cumulative ACK, which moves forward the > SND.UNA pointer. Then, if there are dupacks and we enter in CA_RECOVERY, the > bytesInFlight are calculated starting from the new SND.UNA. > > What do you think?
Created attachment 2225 [details] test-tcp-parameters.cc
Ah-ha! I got it. The most immediate, easy and reasonable fix is to ignore dupacks which reports a timestamp option echo value which is less than the timestamp of the first retransmission. However this does not work without timestamp.. mmm. Fixing bug 2068 could help in this, since we now are underestimating the rtt in cases of losses. I'm concentrating now on correcting 2068, hoping that this will alleviate the situation. Stay tuned.
(In reply to natale.patriciello from comment #18) > Ah-ha! I got it. > > The most immediate, easy and reasonable fix is to ignore dupacks which > reports a timestamp option echo value which is less than the timestamp of > the first retransmission. However this does not work without timestamp.. mmm. > See my comment #11 above. This is the purpose of the "recover" variable in the RFC, to suppress fast retransmission/recovery immediately after a timeout. I suggest to try fixing this by maintaining a recover variable rather than try to avoid it with timestamps or other heuristics.
After investigating the fixes to the TcpTxBuffer in the tcp-sack branch, the problem still persists. The issue here is that the TcpTxBuffer's m_sentSize, which is the value returned by BytesInFlight() in this fix behaves as follows: 1. It's increased by the segment size each time we send a segment. 2. Decreased by the bytes acknowledged by a full ack. When we are in fast recovery and have inflation, we are inflating this number and not decreasing it accordingly when we get duplicate acks. The problem is therefore appearing when we have a first period of fast recovery, followed immediately by a second one. I propose the following fix, which was taken from ns-2. In tcp-socket-base.cc, ReceivedAck(), line 1416, replace : m_tcb->m_ssThresh = m_congestionControl->GetSsThresh (m_tcb, BytesInFlight ()); with: uint32_t flightSize = std::min(BytesInFlight (), m_tcb->m_cWnd.Get()); m_tcb->m_ssThresh = m_congestionControl->GetSsThresh (m_tcb, flightSize); This guarantees that we never attempt to set the ssThresh to the inflated value reported by BytesInFlight() the second time we go into fast recovery. I've ran my test script with this fix, and the problem seems to have been resolved. Obviously this should be tested more thoroughly. The fix is just a workaround: the better solution would be to use the scoreboard/pipe mechanism of SACK as Tom has suggested.
Created attachment 2235 [details] Proposed patch for dupacks and timeouts.
I have a quick patch for the timeouts problem. I only tested it with my problematic script, but thought I will put it here for you guys to check. See the most recent file upload. In short it does what is in RFC 6582 (I hope). It adds m_recover = m_highTxMark; to TcpSocketBase's ReTxTimeout () method, and checks m_recover when we get 3 duplicate ACKs: if ((m_dupAckCount == m_retxThresh) && ((m_highRxAckMark - 1) > m_recover)) Removed the assert that checked if we moved to a new state if we get 3 duplicate acks.
Created attachment 2242 [details] Slightly edited patch I've updated the patch to remain in limited transmit if m_dupAck > 3 but we not entered in fast recovery. The patch seems to follow the RFC, but I think the RFC is going extremely over on what it is supposed to do. This because, when the RTO expires, the m_highTxMark could be as high as we inflated the window. In your script, RTO is for 87121, and we set m_recover to 142297. We don't enter fast recovery anymore until 142297 is acked. Anyway, this seems to conform to RFC. We misses the ACK Heuristic to infer losses, but this should be a new feature request. If you are ok with this patch, I'll push.
Hi Natale, Yes, the edited patch looks good and is working correctly with my code. You can go ahead and push it if it passes the TCP tests and Tom has no objections. > The patch seems to follow the RFC, but I think the RFC is going extremely > over on what it is supposed to do. > > This because, when the RTO expires, the m_highTxMark could be as high as we > inflated the window. In your script, RTO is for 87121, and we set m_recover > to 142297. We don't enter fast recovery anymore until 142297 is acked. > This is actually the behaviour we want, because we really want to recover the lost flight of packets using the timeout retransmission we just invoked, so we should probably not try to go into fast retransmit again :). We still need to deal with the problem of the incorrect CWND which is set when we get two consecutive fast retransmit scenarios. See my comment #20. Cheers, Lynne (In reply to natale.patriciello from comment #23) > Created attachment 2242 [details] > Slightly edited patch > > I've updated the patch to remain in limited transmit if m_dupAck > 3 but we > not entered in fast recovery. > > The patch seems to follow the RFC, but I think the RFC is going extremely > over on what it is supposed to do. > > This because, when the RTO expires, the m_highTxMark could be as high as we > inflated the window. In your script, RTO is for 87121, and we set m_recover > to 142297. We don't enter fast recovery anymore until 142297 is acked. > > Anyway, this seems to conform to RFC. We misses the ACK Heuristic to infer > losses, but this should be a new feature request. > > If you are ok with this patch, I'll push.
(In reply to natale.patriciello from comment #23) > Created attachment 2242 [details] > Slightly edited patch > > I've updated the patch to remain in limited transmit if m_dupAck > 3 but we > not entered in fast recovery. > > The patch seems to follow the RFC, but I think the RFC is going extremely > over on what it is supposed to do. The reason that it suppresses further fast retransmissions is that some retransmissions after a timeout will cause dupacks because they are retransmissions of segments that have already been received. If you repeatedly re-enter fast retransmit due to many holes in the receiver buffer, your ssthresh will be whittled down to almost nothing. In high bandwidth-delay product networks, the TCP flow is somewhat crippled as it must rebuild window only through congestion avoidance. So the cost of not detecting future true fast retransmissions is balanced by the benefit of preserving ssthresh that is not too small. Plus as you mention, it is possible to add some heuristics to detect new fast retransmissions and the RFC allows such heuristics. > > This because, when the RTO expires, the m_highTxMark could be as high as we > inflated the window. In your script, RTO is for 87121, and we set m_recover > to 142297. We don't enter fast recovery anymore until 142297 is acked. > > Anyway, this seems to conform to RFC. We misses the ACK Heuristic to infer > losses, but this should be a new feature request. > > If you are ok with this patch, I'll push. The patch looks OK to me. Also, can we add a test case testing proper values of ssthresh and cwnd after such event?
Ok, I understand all the point now. BTW, reading and re-reading the limited transit RFC I'm not sure about that: (In reply to natale.patriciello from comment #23) > I've updated the patch to remain in limited transmit if m_dupAck > 3 but we > not entered in fast recovery. Is that correct, or limited transmit only applies to the first 2 dupack ? RFC says: This document does not specify the sender's response to duplicate ACKs when the fast retransmit/fast recovery algorithm is not invoked. This is addressed in other documents, such as those describing the Limited Transmit procedure [RFC3042]. But in limited transmit, RFC talks about the first 2 dupAck only. > Also, can we add a test case testing proper values of ssthresh and cwnd > after such event? You mean the fact that, after an RTO, we don't enter fast retransmission again until m_recover is acked ? Because the value of cWnd and ssTh for Fast Retransmission and RTO are already checked.
pushed 11832:02e355cbbbf5 Thank you all
(In reply to natale.patriciello from comment #26) > Ok, I understand all the point now. > > BTW, reading and re-reading the limited transit RFC I'm not sure about that: > (In reply to natale.patriciello from comment #23) > > I've updated the patch to remain in limited transmit if m_dupAck > 3 but we > > not entered in fast recovery. > > Is that correct, or limited transmit only applies to the first 2 dupack ? > RFC says: > > This document does not specify the sender's response to duplicate > ACKs when the fast retransmit/fast recovery algorithm is not invoked. > This is addressed in other documents, such as those describing the > Limited Transmit procedure [RFC3042]. > > But in limited transmit, RFC talks about the first 2 dupAck only. Sorry for late response. I think that the way you have it now is OK (in the case that multiple dupacks are received in the period in which fast recovery is suppressed). However, could you please check the logic for limited transmit? It seems to me that you do not invoke the limited transmit upon the first dupack since you are in CA_OPEN at that time. > > > Also, can we add a test case testing proper values of ssthresh and cwnd > > after such event? > > > You mean the fact that, after an RTO, we don't enter fast retransmission > again until m_recover is acked ? Because the value of cWnd and ssTh for Fast > Retransmission and RTO are already checked. It is probably OK as is; the NS_LOG statements cover all of these variable settings at the state transmissions.
(In reply to Tom Henderson from comment #28) > However, could you please check the logic for limited transmit? It seems to > me that you do not invoke the limited transmit upon the first dupack since > you are in CA_OPEN at that time. You are right; I'm checking now this patch diff -r 1862e8f860e2 src/internet/model/tcp-socket-base.cc --- a/src/internet/model/tcp-socket-base.cc Fri Jan 22 11:18:58 2016 -0800 +++ b/src/internet/model/tcp-socket-base.cc Sat Jan 23 18:54:45 2016 +0100 @@ -1415,6 +1415,14 @@ m_dupAckCount << " dup ACKs"); m_tcb->m_congState = TcpSocketState::CA_DISORDER; + if (m_limitedTx && m_txBuffer->SizeFromSequence (m_nextTxSequence) > 0) + { + // RFC3042 Limited transmit: Send a new packet for each duplicated ACK before fast retransmit + NS_LOG_INFO ("Limited transmit"); + uint32_t sz = SendDataPacket (m_nextTxSequence, m_tcb->m_segmentSize, true); + m_nextTxSequence += sz; + } + NS_LOG_DEBUG ("OPEN -> DISORDER"); } else if (m_tcb->m_congState == TcpSocketState::CA_DISORDER) @@ -1436,7 +1444,7 @@ m_tcb->m_ssThresh << " at fast recovery seqnum " << m_recover); DoRetransmit (); } - else if (m_limitedTx) + else if (m_limitedTx && m_txBuffer->SizeFromSequence (m_nextTxSequence) > 0) { // RFC3042 Limited transmit: Send a new packet for each duplicated ACK before fast retransmit NS_LOG_INFO ("Limited transmit"); but it breaks fast-retr-test (because now we transmit 3 segments instead of 2, and the test was balancing dupacks and delayed ack manually). Working on it.
Created attachment 2253 [details] Limited transmit also for the CA_OPEN->CA_REORDER transition This patch inserts limited transmit also in such case. Testcase updated.
Fixed in 11850 and 11843. I probably assigned it wrongly: i am deeply sorry for that, never happen in the future.
Thanks Natale! Are there any plans to fix the BytesInFlight() bug? Or are we assuming that will be fixed when SACK becomes available?
Unfortunately I have to reopen this bug. It seems the actual value of m_highTxMark is not what we have expected, which is maximum transmitted sequence number, but rather it is max transmitted sequence number + its size, i.e. from the function SendPendingData, we have: // Update highTxMark m_highTxMark = std::max (seq + sz, m_highTxMark.Get ()); RFC 6582 assumes recover points to the highest sequence number transmitted, not the highest expected ack number. As a result, I am currently running into a bug where we are ignoring duplicate acks immediately after we recover from a timeout even though we should go into fast recovery. The fix is in TcpSocketBase::ReceivedAck: else if (m_tcb->m_congState == TcpSocketState::CA_DISORDER) { - if ((m_dupAckCount == m_retxThresh) && ((m_highRxAckMark - 1) > m_recover)) + if ((m_dupAckCount == m_retxThresh) && (m_highRxAckMark >= m_recover))
(In reply to l.salameh from comment #33) > The fix is in TcpSocketBase::ReceivedAck: > > > else if (m_tcb->m_congState == TcpSocketState::CA_DISORDER) > { > - if ((m_dupAckCount == m_retxThresh) && ((m_highRxAckMark - 1) > > m_recover)) > + if ((m_dupAckCount == m_retxThresh) && (m_highRxAckMark >= > m_recover)) We had passed a lot of time discussing this.. without having a test case I'm diffident to push this change, since in the near future we maybe need to revert that again. Anyway, thinking on rounded segment size (e.g. 1000), * if m_recover = 10000 (the 10th segment) and the highest Rx points to 9001, both are false; * if m_recover = 10000 and highest Rx points to 10001, one is false and the other is true. In this case, we should go in fast recovery, right? For me yes, and your fix is correct.
Fixed in 11894