Bugzilla – Bug 2859
calculation of BytesInFlight during partial recovery
Last modified: 2018-03-03 06:02:00 UTC
opened to track the issue in this thread: http://mailman.isi.edu/pipermail/ns-developers/2018-January/014248.html
Created attachment 3018 [details] proposed patch by Viyom
Viyom's patch does not affect the ns3-tcp-cwnd test (testing NewReno partial ack behavior). However, it seems to conflict with the comment directly above it, which says: // RFC 6675, Section 5, 2nd paragraph: // If the incoming ACK is a cumulative acknowledgment, the TCP MUST // reset DupAcks to zero.
Here are some responses inline to these ideas from the list discussion Natale wrote: > I am in favor of: > *) splitting things that are not related to the TCP state machine > processing (open-close-established...) in another file; I agree that tcp-socket-base is probably too large. > *) reduce the size of the functions by splitting them (e.g., ProcessedAck is too big) agree > *) reduce the number of "if sack then..", by trying to re-use CraftSackOption. I would like to suggest going about it differently than CraftSackOption. The goal is to reuse code, but IMO, crafting a fake option to reuse the API is a bit confusing to readers and future maintainers. Instead, I suggest to rework the APIs to reuse scoreboard and other code without appending options. > *) keeping count of lost, retransmitted, dupack, to save time when > calculating in_flight and other values (in both modes) Let's go for clarity in the code first, and then optimize for performance after profiling and experience. > *) the congestion window trace value will include deflation/deflation, > but in the code will be used the version without inflation/deflation > (in both modes) I am not sure about reporting a trace value that is manipulated from the underlying value in the code; perhaps we need to maintain and report two versions (inflated and non-inflated)? In general, I believe that SACK and NewReno are not incompatible but should be combined. Reno and NewReno are about keeping the flow going during loss recovery, based on the available information, such that the transition out of fast recovery is not abrupt (that is the idea of Reno, and NewReno is just an improvement on Reno to handle partial acknowledgments). SACK provides more information and can be used to accelerate recovery, but the same principle in Reno/NewReno about pacing out data appropriately can still apply. I don't see why the same basic implementation can't be used and enhanced by the presence of SACK information. I think that is the spirit of RFC 6675 also, although I don't know offhand whether the prescribed implementation differs slightly.
Hello,
(In reply to natale.patriciello from comment #4) > Hello, (continuing) The proposed patch is wrong. The dupack count must be 0 after we receive an ack that covers the SND.UNA, because we could re-enter in fast retransmit as soon as we receive three dupacks. The only solution to this problem is not taking into account dupacks in the BytesInFlight function but instead calculating how many segments left the network.
Created attachment 3062 [details] Patch for bytes-in-flight test and consistency of m_lostOut variable Hello Natale, We went through your latest commits in tcp-corrections branch. As mentioned in your mail that existing bytes-in-flight test was failing, we made the required corrections to make sure it works correctly. Additionally, we found that the value of m_lostOut includes acked segments when ReadOptions () is called which is incorrect. It is later updated to correct value when DiscardUpto () is called in the ProcessAck (). In between these two updates, the value of m_lostOut remains inconsistent. We have also handled this issue in the patch and made sure that all the tests which were passing earlier still pass. The rest of the code looks good to us. Please review it and let us know if this is fine. Thanks, Viyom Vivek
I agree with the patch. The rationale is that any call to BytesInFlight in between the reception of the ACK and DiscardUpTo function is wrong while calling DiscardUpTo early in the process provides a right value for BytesInFlight (updated to the latest knowledge, which is the received ACK). Inserted in my branch.
I patched the tip of tcp-corrections branch (changeset c85b2a41) with Viyom's patch, and enabled NSC, and ran the test suite. Two tests failed: $ ./waf --run 'test-runner --suite=tcp-rtt-estimation-test' assert failed. cond="m_dupAckCount == 1", msg="From OPEN->DISORDER but with 14 dup ACKs", file=../src/internet/model/tcp-socket-base.cc, line=1525 terminate called without an active exception $ ./waf --run 'test-runner --suite=ns3-tcp-cwnd' aborted. cond="!(m_vectors.size () > i)", msg="TestVectors::Get(): Bad index", file=./ns3/test.h, line=1486 terminate called without an active exception I haven't had a chance to look into whether these are small alignments of test output with the new code, or more. Will report back what I find, but just noting these now in case there are known fixes for these.
> $ ./waf --run 'test-runner --suite=tcp-rtt-estimation-test' > > assert failed. cond="m_dupAckCount == 1", msg="From OPEN->DISORDER but with 14 > dup ACKs", file=../src/internet/model/tcp-socket-base.cc, line=1525 > terminate called without an active exception I'm working on it
Tom, I've pushed some commit onto tcp-corrections branch. I still have to rework them a bit, adding documentation and squashing them into single commits, but the tests do not fail anymore.
fixed in 13372:950f852a89f9