Bug 2859 - calculation of BytesInFlight during partial recovery
calculation of BytesInFlight during partial recovery
Status: RESOLVED FIXED
Product: ns-3
Classification: Unclassified
Component: tcp
ns-3.27
All All
: P3 normal
Assigned To: natale.patriciello
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-01-26 10:09 UTC by Tom Henderson
Modified: 2018-03-03 06:02 UTC (History)
3 users (show)

See Also:


Attachments
proposed patch by Viyom (818 bytes, patch)
2018-01-26 10:37 UTC, Tom Henderson
Details | Diff
Patch for bytes-in-flight test and consistency of m_lostOut variable (6.42 KB, patch)
2018-02-22 15:21 UTC, Viyom Mittal
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Henderson 2018-01-26 10:09:14 UTC
opened to track the issue in this thread:
http://mailman.isi.edu/pipermail/ns-developers/2018-January/014248.html
Comment 1 Tom Henderson 2018-01-26 10:37:37 UTC
Created attachment 3018 [details]
proposed patch by Viyom
Comment 2 Tom Henderson 2018-01-26 10:40:08 UTC
Viyom's patch does not affect the ns3-tcp-cwnd test (testing NewReno partial ack behavior).  However, it seems to conflict with the comment directly above it, which says:

  // RFC 6675, Section 5, 2nd paragraph:
  // If the incoming ACK is a cumulative acknowledgment, the TCP MUST
  // reset DupAcks to zero.
Comment 3 Tom Henderson 2018-01-26 12:19:56 UTC
Here are some responses inline to these ideas from the list discussion

Natale wrote:
> I am in favor of:
> *) splitting things that are not related to the TCP state machine
> processing (open-close-established...) in another file;

I agree that tcp-socket-base is probably too large.

> *) reduce the size of the functions by splitting them (e.g.,
ProcessedAck is too big)

agree

> *) reduce the number of "if sack then..", by trying to re-use CraftSackOption.

I would like to suggest going about it differently than CraftSackOption.  The goal is to reuse code, but IMO, crafting a fake option to reuse the API is a bit confusing to readers and future maintainers.  Instead, I suggest to rework the APIs to reuse scoreboard and other code without appending options.

> *) keeping count of lost, retransmitted, dupack, to save time when
> calculating in_flight and other values (in both modes)

Let's go for clarity in the code first, and then optimize for performance after profiling and experience.

> *) the congestion window trace value will include deflation/deflation,
> but in the code will be used the version without inflation/deflation
> (in both modes)

I am not sure about reporting a trace value that is manipulated from the underlying value in the code; perhaps we need to maintain and report two versions (inflated and non-inflated)?  

In general, I believe that SACK and NewReno are not incompatible but should be combined.  Reno and NewReno are about keeping the flow going during loss recovery, based on the available information, such that the transition out of fast recovery is not abrupt (that is the idea of Reno, and NewReno is just an improvement on Reno to handle partial acknowledgments).  SACK provides more information and can be used to accelerate recovery, but the same principle in Reno/NewReno about pacing out data appropriately can still apply.  I don't see why the same basic implementation can't be used and enhanced by the presence of SACK information.  I think that is the spirit of RFC 6675 also, although I don't know offhand whether the prescribed implementation differs slightly.
Comment 4 natale.patriciello 2018-02-20 09:17:26 UTC
Hello,
Comment 5 natale.patriciello 2018-02-20 09:20:13 UTC
(In reply to natale.patriciello from comment #4)
> Hello,

(continuing) The proposed patch is wrong. The dupack count must be 0 after we receive an ack that covers the SND.UNA, because we could re-enter in fast retransmit as soon as we receive three dupacks.

The only solution to this problem is not taking into account dupacks in the BytesInFlight function but instead calculating how many segments left the network.
Comment 6 Viyom Mittal 2018-02-22 15:21:47 UTC
Created attachment 3062 [details]
Patch for bytes-in-flight test and consistency of m_lostOut variable

Hello Natale,

We went through your latest commits in tcp-corrections branch. As mentioned in your mail that existing bytes-in-flight test was failing, we made the required corrections to make sure it works correctly.

Additionally, we found that the value of m_lostOut includes acked segments when ReadOptions () is called which is incorrect. It is later updated to correct value when DiscardUpto () is called in the ProcessAck (). In between these two updates, the value of m_lostOut remains inconsistent. We have also handled this issue in the patch and made sure that all the tests which were passing earlier still pass.

The rest of the code looks good to us. Please review it and let us know if this is fine.

Thanks,
Viyom
Vivek
Comment 7 natale.patriciello 2018-02-23 05:07:27 UTC
I agree with the patch. The rationale is that any call to BytesInFlight in between the reception of the ACK and DiscardUpTo function is wrong while calling DiscardUpTo early in the process provides a right value for BytesInFlight (updated to the latest knowledge, which is the received ACK). Inserted in my branch.
Comment 8 Tom Henderson 2018-02-23 11:20:40 UTC
I patched the tip of tcp-corrections branch (changeset c85b2a41) with Viyom's patch, and enabled NSC, and ran the test suite.  Two tests failed:

$ ./waf --run 'test-runner --suite=tcp-rtt-estimation-test'

assert failed. cond="m_dupAckCount == 1", msg="From OPEN->DISORDER but with 14 dup ACKs", file=../src/internet/model/tcp-socket-base.cc, line=1525
terminate called without an active exception

$ ./waf --run 'test-runner --suite=ns3-tcp-cwnd'

aborted. cond="!(m_vectors.size () > i)", msg="TestVectors::Get(): Bad index", file=./ns3/test.h, line=1486
terminate called without an active exception

I haven't had a chance to look into whether these are small alignments of test output with the new code, or more.  Will report back what I find, but just noting these now in case there are known fixes for these.
Comment 9 natale.patriciello 2018-02-23 16:43:03 UTC
> $ ./waf --run 'test-runner --suite=tcp-rtt-estimation-test'
>
> assert failed. cond="m_dupAckCount == 1", msg="From OPEN->DISORDER but with 14 > dup ACKs", file=../src/internet/model/tcp-socket-base.cc, line=1525
> terminate called without an active exception

I'm working on it
Comment 10 natale.patriciello 2018-02-23 17:27:25 UTC
Tom,

I've pushed some commit onto tcp-corrections branch. I still have to rework them a bit, adding documentation and squashing them into single commits, but the tests do not fail anymore.
Comment 11 natale.patriciello 2018-03-03 06:02:00 UTC
fixed in 13372:950f852a89f9