Bug 2248 - review request: TCP SACK
review request: TCP SACK
Status: RESOLVED FIXED
Product: ns-3
Classification: Unclassified
Component: tcp
pre-release
PC Linux
: P5 enhancement
Assigned To: natale.patriciello
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2015-12-21 23:09 UTC by Tom Henderson
Modified: 2017-02-22 09:30 UTC (History)
3 users (show)

See Also:


Attachments
proposed patch (2.31 KB, patch)
2016-11-24 10:42 UTC, Artem Krasilov
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Henderson 2015-12-21 23:09:15 UTC
This is to track the review request for the code in:

https://codereview.appspot.com/283880043/
Comment 1 natale.patriciello 2016-05-22 16:20:35 UTC
Updated: https://codereview.appspot.com/299130043/
Comment 2 Tom Henderson 2016-10-18 15:30:52 UTC
Latest review comments should be against the github branch here:

https://github.com/kronat/ns-3-dev-git/tree/tcp-sack

and not at the codereview site (which doesn't contain the latest code).

The following review comments come from a review of the code at changeset:
https://github.com/kronat/ns-3-dev-git/commit/00f022d76902377438498004455dc1a011343ac6

I did some light testing (comparison of pcap traces and throughput behavior of this SACK-enabled version with the Linux NSC stack with SACK enabled).  I modified tcp-nsc-comparison.cc to use SACK for both nsc-linux and ns3 stacks.

Comments and suggestions:

1) There is very little test coverage of SACK in operation.  There is a new test of the 'SACK permitted' operation, but not much (any?) test of it being used.

Is the idea here to merge it as disabled and then add SACK-enabled tests before changing default to enabled?

Not everything has to be included as a TestSuite regression test, but I couldn't find any statements that claim that it has been tested by some means.  If some kind of verification testing has been done offline and simply not documented, can you instead document what testing has already been performed?

2) with the current default behavior of disabled, I observed an test.py example CRASHing:

src/traffic-control/examples/adaptive-red-tests --testNumber=12

This isn't really crashing like a segmentation fault; it is just that this adaptive-red-tests program returns -1 upon exit (should probably be changed to exit (1)).  However, if SACK is disabled, this test shouldn't be failing-- something needs to be debugged here.

3) with the behavior enabled globally, there are a lot of failures and crashes.  The crashes need to be turned into failures, and each failure should be looked at to see whether it is merely a small diff in behavior or something wrong in SACK operation.

4) the attribute name 'SACK' and some of the method names (e.g ProcessOptionSACK) should not be all caps, IMO, for API consistency with ns-3 naming.  Even within this patch, it is not consistent (sometimes capitalized as 'Sack' as in TcpOptionSack).  I suggest to change the name of the attribute to 'Sack' and change new API that uses all caps to use 'Sack' instead).

4) I read this in the documentation:

+ Please note that the generated SACK option (in case of non-SACK
+ receiver) by the sender never leave the sender node itself; they are generated
+ by the TCP implementation and then consumed.

and 

+ * All the informations on the sent segments (e.g. SACK) are saved under
+ * a form of PacketTag (that never leaves this class)

However, I could not find in the git history (I was reading through a 'git diff -r ade680078d3365') that there is any packet tag being added or removed in this code.  Is the documentation correct?

5) I read this also:

+When SACK attribute is enabled for the receiver socket, the sender will not
+craft any SACK option, relying only on what it receives from the network.

Is this statement correct, or should first 'enabled' read 'disabled' instead?

6) Regarding the default of disabling SACK, I would suggest that we change the default to enabled and document this as a behavioral change that can be reverted by globally setting back to false.  The reason is that SACK is so pervasive that we should have our defaults aligned with it for future users of the simulator (in my opinion).

Also, we will more quickly spot possible bugs if ns-3-dev is converted to default SACK on.
Comment 3 natale.patriciello 2016-10-19 11:48:22 UTC
(In reply to Tom Henderson from comment #2)
> Latest review comments should be against the github branch here:
> 
> https://github.com/kronat/ns-3-dev-git/tree/tcp-sack
> 
> and not at the codereview site (which doesn't contain the latest code).

I have updated the codereview; for the future, I can always forgot to upload to codereview, so please check that branch.

> Comments and suggestions:
> 
> 1) There is very little test coverage of SACK in operation.  There is a new
> test of the 'SACK permitted' operation, but not much (any?) test of it being
> used.
> 
> Is the idea here to merge it as disabled and then add SACK-enabled tests
> before changing default to enabled?

All the logic for SACK is contained in TcpTxBuffer, and is available to TcpSocketBase through its public API (which are tested atomically in tcp-tx-buffer-test.cc). What you mean is some simple tests that checks if selective retransmissions are done correctly?
I will also update the documentation to reflect the previous.

> 2) with the current default behavior of disabled, I observed an test.py
> example CRASHing:
> 
> src/traffic-control/examples/adaptive-red-tests --testNumber=12
> 
> This isn't really crashing like a segmentation fault; it is just that this
> adaptive-red-tests program returns -1 upon exit (should probably be changed
> to exit (1)).  However, if SACK is disabled, this test shouldn't be
> failing-- something needs to be debugged here.
> 
> 3) with the behavior enabled globally, there are a lot of failures and
> crashes.  The crashes need to be turned into failures, and each failure
> should be looked at to see whether it is merely a small diff in behavior or
> something wrong in SACK operation.


I talked with Mohit, the very problem is that these tests rely on the TCP for testing the AQMs. I mean, the number of byte transmitted by TCP: with SACK, the performance of TCP increases and a greater number of byte are transmitted, tricking the AQM tests that the AQM itself is doing something wrong.
I would like to see these tests splitted from transport layer, if possible (e.g. test atomically the DoEnqueue / DoDequeue API on each AQM).

> 4) the attribute name 'SACK' and some of the method names (e.g
> ProcessOptionSACK) should not be all caps, IMO, for API consistency with
> ns-3 naming.  Even within this patch, it is not consistent (sometimes
> capitalized as 'Sack' as in TcpOptionSack).  I suggest to change the name of
> the attribute to 'Sack' and change new API that uses all caps to use 'Sack'
> instead).

Ok, I will do the change.

> 4) I read this in the documentation:
> 
> + Please note that the generated SACK option (in case of non-SACK
> + receiver) by the sender never leave the sender node itself; they are
> generated
> + by the TCP implementation and then consumed.
> 
> and 
> 
> + * All the informations on the sent segments (e.g. SACK) are saved under
> + * a form of PacketTag (that never leaves this class)
> 
> However, I could not find in the git history (I was reading through a 'git
> diff -r ade680078d3365') that there is any packet tag being added or removed
> in this code.  Is the documentation correct?

I forgot to update the documentation. To speed up things, I avoided the use of any Tag, relying instead on a container for the packets.

> 5) I read this also:
> 
> +When SACK attribute is enabled for the receiver socket, the sender will not
> +craft any SACK option, relying only on what it receives from the network.
> 
> Is this statement correct, or should first 'enabled' read 'disabled' instead?

Is correct; probably the misunderstanding comes from the fact that "it receives from the network" is very misleading. I wanted to say "on the option that it receives from the network". I will update accordingly.

> 6) Regarding the default of disabling SACK, I would suggest that we change
> the default to enabled and document this as a behavioral change that can be
> reverted by globally setting back to false.  The reason is that SACK is so
> pervasive that we should have our defaults aligned with it for future users
> of the simulator (in my opinion).
> 
> Also, we will more quickly spot possible bugs if ns-3-dev is converted to
> default SACK on.

But also there is the opposite problem, which is that we will spot later on bugs on the behavior with SACK off (i.e. with "craft SACK option" codepath).

I think probably the best is to merge it, keep default off for ns-3.27, and then enable it for ns-3.28.
Comment 4 Mohit 2016-10-22 09:33:20 UTC
Thanks for providing SACK implementation, Natale.

(In reply to natale.patriciello from comment #3)
> (In reply to Tom Henderson from comment #2)
> > Latest review comments should be against the github branch here:
> > 
> > https://github.com/kronat/ns-3-dev-git/tree/tcp-sack
> > 
> > and not at the codereview site (which doesn't contain the latest code).
> 
> I have updated the codereview; for the future, I can always forgot to upload
> to codereview, so please check that branch.
> 
> > Comments and suggestions:
> > 
> > 1) There is very little test coverage of SACK in operation.  There is a new
> > test of the 'SACK permitted' operation, but not much (any?) test of it being
> > used.
> > 
> > Is the idea here to merge it as disabled and then add SACK-enabled tests
> > before changing default to enabled?
> 
> All the logic for SACK is contained in TcpTxBuffer, and is available to
> TcpSocketBase through its public API (which are tested atomically in
> tcp-tx-buffer-test.cc). What you mean is some simple tests that checks if
> selective retransmissions are done correctly?
> I will also update the documentation to reflect the previous.
> 
> > 2) with the current default behavior of disabled, I observed an test.py
> > example CRASHing:
> > 
> > src/traffic-control/examples/adaptive-red-tests --testNumber=12
> > 
> > This isn't really crashing like a segmentation fault; it is just that this
> > adaptive-red-tests program returns -1 upon exit (should probably be changed
> > to exit (1)).  However, if SACK is disabled, this test shouldn't be
> > failing-- something needs to be debugged here.

Thanks for this suggestion, Tom Sir. I changed them to exit (1), and the example fails now instead of crashing.

But I agree that this test shouldn't fail when SACK is disabled. This is an example program in traffic-control/examples, and is actually different from Adaptive RED test suite (which passes when SACK is disabled).

> > 
> > 3) with the behavior enabled globally, there are a lot of failures and
> > crashes.  The crashes need to be turned into failures, and each failure
> > should be looked at to see whether it is merely a small diff in behavior or
> > something wrong in SACK operation.

Natale, I observed that when SACK is enabled, the following tests crash because an assert is failing:

adaptive-red-queue-disc
src/traffic-control/examples/adaptive-red-tests --testNumber=12
src/traffic-control/examples/adaptive-red-tests --testNumber=7
src/traffic-control/examples/pfifo-vs-red --queueDiscType=PfifoFast
src/traffic-control/examples/pfifo-vs-red --queueDiscType=PfifoFast --modeBytes=1
src/traffic-control/examples/red-tests --testNumber=1
src/traffic-control/examples/red-tests --testNumber=4
src/traffic-control/examples/red-tests --testNumber=5

The assert message is this:

assert failed. cond="beginOfCurrentPacket + current->GetSize () <= m_highestSack.second", file=../src/internet/model/tcp-tx-buffer.cc, line=678

> 
> 
> I talked with Mohit, the very problem is that these tests rely on the TCP
> for testing the AQMs. I mean, the number of byte transmitted by TCP: with
> SACK, the performance of TCP increases and a greater number of byte are
> transmitted, tricking the AQM tests that the AQM itself is doing something
> wrong.

Yes, this happens with the following example when we enable SACK:

src/traffic-control/examples/red-vs-ared --queueDiscType=RED

This crashes, but its again because of exit (-1). I changed it to exit (1), and it fails now. The above mentioned assert is not a problem with this example.

> I would like to see these tests splitted from transport layer, if possible
> (e.g. test atomically the DoEnqueue / DoDequeue API on each AQM).

Since most of the crashes are related to example programs and not the test-suite, it may not be a good idea to split them from transport layer (in my opinion).

I look forward to your suggestions, and hope to assist you in resolving these issues.

Thanks again for this great work.

> 
> > 4) the attribute name 'SACK' and some of the method names (e.g
> > ProcessOptionSACK) should not be all caps, IMO, for API consistency with
> > ns-3 naming.  Even within this patch, it is not consistent (sometimes
> > capitalized as 'Sack' as in TcpOptionSack).  I suggest to change the name of
> > the attribute to 'Sack' and change new API that uses all caps to use 'Sack'
> > instead).
> 
> Ok, I will do the change.
> 
> > 4) I read this in the documentation:
> > 
> > + Please note that the generated SACK option (in case of non-SACK
> > + receiver) by the sender never leave the sender node itself; they are
> > generated
> > + by the TCP implementation and then consumed.
> > 
> > and 
> > 
> > + * All the informations on the sent segments (e.g. SACK) are saved under
> > + * a form of PacketTag (that never leaves this class)
> > 
> > However, I could not find in the git history (I was reading through a 'git
> > diff -r ade680078d3365') that there is any packet tag being added or removed
> > in this code.  Is the documentation correct?
> 
> I forgot to update the documentation. To speed up things, I avoided the use
> of any Tag, relying instead on a container for the packets.
> 
> > 5) I read this also:
> > 
> > +When SACK attribute is enabled for the receiver socket, the sender will not
> > +craft any SACK option, relying only on what it receives from the network.
> > 
> > Is this statement correct, or should first 'enabled' read 'disabled' instead?
> 
> Is correct; probably the misunderstanding comes from the fact that "it
> receives from the network" is very misleading. I wanted to say "on the
> option that it receives from the network". I will update accordingly.
> 
> > 6) Regarding the default of disabling SACK, I would suggest that we change
> > the default to enabled and document this as a behavioral change that can be
> > reverted by globally setting back to false.  The reason is that SACK is so
> > pervasive that we should have our defaults aligned with it for future users
> > of the simulator (in my opinion).
> > 
> > Also, we will more quickly spot possible bugs if ns-3-dev is converted to
> > default SACK on.
> 
> But also there is the opposite problem, which is that we will spot later on
> bugs on the behavior with SACK off (i.e. with "craft SACK option" codepath).
> 
> I think probably the best is to merge it, keep default off for ns-3.27, and
> then enable it for ns-3.28.
Comment 5 Mohit 2016-10-22 09:48:37 UTC
Natale,

I forgot to mention that in the following list of crashes, none crash due to a return of -1 upon exit:

adaptive-red-queue-disc
src/traffic-control/examples/adaptive-red-tests --testNumber=12
src/traffic-control/examples/adaptive-red-tests --testNumber=7
src/traffic-control/examples/pfifo-vs-red --queueDiscType=PfifoFast
src/traffic-control/examples/pfifo-vs-red --queueDiscType=PfifoFast --modeBytes=1
src/traffic-control/examples/red-tests --testNumber=1
src/traffic-control/examples/red-tests --testNumber=4
src/traffic-control/examples/red-tests --testNumber=5

I changed exit (-1) to exit (1) where necessary before running the test.py.
Comment 6 natale.patriciello 2016-11-03 12:26:31 UTC
Hi Tom, Mohit,

I've updated the code in the repository (branch tcp-sack). Please delete and re-download the branch since I changed the history.

Changelog:

*) Resolved the assert failing (it was an error; that assert is not correct, and it trigger even under normal circumstancies. In fact, it was triggered because some sack blocks where - correctly - reported twice, and the assert was stating something as "that block has been already analized")

*) Renamed all methods in *Sack*

*) Explicitly disabled SACK option for tests that strongly relies on "non-sack" mode

However, these tests still fails:


List of FAILed tests:
    examples/wireless/wifi-tcp
List of CRASHed tests:
    src/traffic-control/examples/adaptive-red-tests --testNumber=12
    src/traffic-control/examples/red-vs-ared --queueDiscType=RED

I've checked, for the crashed test it is still the problem I've reported some time ago (Drops due to prob mark should be less than the drops due to hard mark, due to the fact that TCP performance increases). For the wifi-tcp example, it prints a list of bandwidth, such as:

9s: 	24.023 Mbit/s
9.1s: 	28.4979 Mbit/s
9.2s: 	22.3744 Mbit/s
9.3s: 	22.3744 Mbit/s
9.4s: 	28.3802 Mbit/s
9.5s: 	26.496 Mbit/s
9.6s: 	25.9072 Mbit/s
9.7s: 	28.3802 Mbit/s
9.8s: 	28.6157 Mbit/s
9.9s: 	28.4979 Mbit/s
10s: 	26.1427 Mbit/s
10.1s: 	28.4979 Mbit/s
10.2s: 	23.6698 Mbit/s
10.3s: 	28.6157 Mbit/s
10.4s: 	28.3802 Mbit/s
10.5s: 	28.4979 Mbit/s
10.6s: 	28.8512 Mbit/s
10.7s: 	28.2624 Mbit/s
10.8s: 	28.6157 Mbit/s
10.9s: 	25.3184 Mbit/s

but then fail.
Please, take a look when you have time. Meanwhile, have a nice day!

Nat
Comment 7 Tom Henderson 2016-11-11 17:05:56 UTC
(In reply to natale.patriciello from comment #6)
> Hi Tom, Mohit,
> 
> I've updated the code in the repository (branch tcp-sack). Please delete and
> re-download the branch since I changed the history.
> 
> Changelog:
> 
> *) Resolved the assert failing (it was an error; that assert is not correct,
> and it trigger even under normal circumstancies. In fact, it was triggered
> because some sack blocks where - correctly - reported twice, and the assert
> was stating something as "that block has been already analized")
> 
> *) Renamed all methods in *Sack*
> 
> *) Explicitly disabled SACK option for tests that strongly relies on
> "non-sack" mode
> 
> However, these tests still fails:
> 

I did a bit more testing of this branch today.


> 
> List of FAILed tests:
>     examples/wireless/wifi-tcp
> List of CRASHed tests:
>     src/traffic-control/examples/adaptive-red-tests --testNumber=12
>     src/traffic-control/examples/red-vs-ared --queueDiscType=RED
> 
> I've checked, for the crashed test it is still the problem I've reported
> some time ago (Drops due to prob mark should be less than the drops due to
> hard mark, due to the fact that TCP performance increases). 

I suggest Mohit to look at fixing these tests if he has time (should not have a dependency on TCP performance).

For the wifi-tcp
> example, it prints a list of bandwidth, such as:
> 
> 9s: 	24.023 Mbit/s
> 9.1s: 	28.4979 Mbit/s

The test fails because the average throughput is < 50 Mbit/s with SACK.  In fact, it is so much less than expected (26 Mbit/s) that it seems like it must be a bug in how SACK is working, since I would expect SACK to increase performance, not decrease it substantially.

Finally, the sack branch is generating valgrind errors now, which needs to be fixed; to reproduce, enable Sack and run the wifi-tcp example under valgrind.

Once these are fixed, we can do some comparative testing with NSC before merging.
Comment 8 Mohit 2016-11-17 03:41:29 UTC
Hi Natale,

I'm sorry for the late response.

Thank you for resolving the assert error.

I did some more testing while keeping SACK disabled. My intention was to see if "SACK disabled" provides same results as "ns-3-dev" for existing TCP examples. I observed that results differ.

1) examples/wireless/wifi-tcp.cc

Results with ns-3-dev:

1.1s: 	44.3955 Mbit/s
1.2s: 	55.936 Mbit/s
1.3s: 	56.5248 Mbit/s
1.4s: 	52.4032 Mbit/s
1.5s: 	55.8182 Mbit/s
1.6s: 	56.407 Mbit/s
1.7s: 	50.7546 Mbit/s
1.8s: 	46.9862 Mbit/s
1.9s: 	51.2256 Mbit/s
.
.
.
10.7s: 	53.9341 Mbit/s
10.8s: 	56.2893 Mbit/s
10.9s: 	51.5789 Mbit/s

Average throughput: 52.674 Mbit/s

Results with SACK repo (SACK disabled):

1.1s: 	28.7334 Mbit/s
1.2s: 	40.5094 Mbit/s
1.3s: 	54.2874 Mbit/s
1.4s: 	55.465 Mbit/s
1.5s: 	55.8182 Mbit/s
1.6s: 	54.6406 Mbit/s
1.7s: 	31.4419 Mbit/s
1.8s: 	49.9302 Mbit/s
1.9s: 	54.7584 Mbit/s
.
.
.
10.7s: 	54.7584 Mbit/s
10.8s: 	36.8589 Mbit/s
10.9s: 	45.2198 Mbit/s

Average throughput: 50.8464 Mbit/s

2) examples/tutorial/fifth.cc

Results with ns-3-dev:
.
.
9.26768	8733
9.276	8765
9.28432	8797
9.29264	8829
9.30096	8861
9.30928	8893
9.3176	8925

Results with SACK repo (SACK disabled):
.
.
.
9.26768	535464
9.276	536000
9.28432	536536
9.29264	537072
9.30096	537608
9.30928	538144
9.3176	538680

I also plotted the cwnd graph for this example and observed that with SACK repo (SACK disabled), there is no packet drop and hence, cwnd never reduces. This is different from the graph produced from ns-3-dev. 

In my opinion, SACK disabled should have same results as ns-3-dev. I tried to debug this issue, but haven't been successful yet. I'll spend some more time and see if I can  get through this.

Also, I have not tested the SACK repo by enabling SACK yet. I'll try doing that too.

Thanks again.

Have a great day!
Comment 9 Mohit 2016-11-17 03:47:22 UTC
One correction in previous message:

2) examples/tutorial/fifth.cc

In SACK repo (with SACK disabled) there "are" packet drops (I just checked the .dat file), but the cwnd doesn't decrease. Even after packet are dropped, cwnd keeps increasing.

Hope it helps.
Comment 10 natale.patriciello 2016-11-17 11:06:46 UTC
(In reply to Mohit from comment #8)
> Hi Natale,
> 
> I'm sorry for the late response.

Hi Mohit, no problem, and thank you for having given look.


> I did some more testing while keeping SACK disabled. My intention was to see
> if "SACK disabled" provides same results as "ns-3-dev" for existing TCP
> examples. I observed that results differ.
> 
> 1) examples/wireless/wifi-tcp.cc
[cut]

> 2) examples/tutorial/fifth.cc
[cut]
 
> I also plotted the cwnd graph for this example and observed that with SACK
> repo (SACK disabled), there is no packet drop and hence, cwnd never reduces.
> This is different from the graph produced from ns-3-dev. 

I investigated the issue. The root of this is inside the algorithm that chooses the next segments to transmit. RFC 6675, which the works is based on, gives a function (NextSeg) that returns a different value. In fact, in older version of ns-3, if you don't have any new data to transmit but the window is big enough to transmit data, you transmit old segments. This is why these examples don't work as before. I can attach the output if you wish, and if you're interested, the exact point is the (3):

"""
      (3) If the conditions for rules (1) and (2) fail, but there exists
          an unSACKed sequence number 'S3' that meets the criteria for
          detecting loss given in steps (1.a) and (1.b) above
          (specifically excluding step (1.c)), then one segment of up to
          SMSS octets starting with S3 SHOULD be returned.
"""

In particular, the implementation is waiting data from upper layer, and it transmits old segments meanwhile. There are some losses, that's true, but they're covered by the segments sent while waiting for application data. In this way, TCP never sees any loss.

> In my opinion, SACK disabled should have same results as ns-3-dev. I tried
> to debug this issue, but haven't been successful yet. I'll spend some more
> time and see if I can  get through this.

Different RFC provides different results. The design choice was (and still is) to have one single code path for both cases (SACK and non-SACK). The RFC says this, and so I would be towards to say "it was a bug of previous ns-3 version".

Anyway, we have a workaround :) RFC 6675, pag 5:, point 3, we have a SHOULD keyword. It means that the conditional could be optional, and we can have an attribute for such point (what name do you suggest?). If it's false, the conditional is skipped (I checked and the behaviour is the same as ns-3-dev, no failing tests), while if it's true you have the RFC 6675. 

Here's the patch:

diff --git i/src/internet/model/tcp-tx-buffer.cc w/src/internet/model/tcp-tx-buffer.cc
index 94ac6ff..a419b37 100644
--- i/src/internet/model/tcp-tx-buffer.cc
+++ w/src/internet/model/tcp-tx-buffer.cc
@@ -96,6 +96,10 @@ TcpTxBuffer::GetTypeId (void)
     .SetParent<Object> ()
     .SetGroupName ("Internet")
     .AddConstructor<TcpTxBuffer> ()
+    .AddAttribute ("NextSeg_RFC6675", "Make NextSeg compliant with RFC 6675",
+                   BooleanValue (false),
+                   MakeBooleanAccessor (&TcpTxBuffer::m_NextSeg_rfc6675),
+                   MakeBooleanChecker ())
     .AddTraceSource ("UnackSequence",
                      "First unacknowledged sequence number (SND.UNA)",
                      MakeTraceSourceAccessor (&TcpTxBuffer::m_firstByteSeq),
@@ -115,7 +119,8 @@ TcpTxBuffer::TcpTxBuffer (uint32_t n)
   : m_maxBuffer (32768),
     m_size (0),
     m_sentSize (0),
-    m_firstByteSeq (n)
+    m_firstByteSeq (n),
+    m_NextSeg_rfc6675 (false)
 {
 }
 
@@ -982,7 +987,7 @@ TcpTxBuffer::NextSeg (SequenceNumber32 *seq, uint32_t dupThresh,
    *     (specifically excluding step (1.c)), then one segment of up to
    *     SMSS octets starting with S3 SHOULD be returned.
    */
-  if (isSeqPerRule3Valid)
+  if (m_NextSeg_rfc6675 && isSeqPerRule3Valid)
     {
       *seq = seqPerRule3;
       return true;
diff --git i/src/internet/model/tcp-tx-buffer.h w/src/internet/model/tcp-tx-buffer.h
index c8665fb..8825bca 100644
--- i/src/internet/model/tcp-tx-buffer.h
+++ w/src/internet/model/tcp-tx-buffer.h
@@ -477,7 +477,7 @@ private:
   TracedValue<SequenceNumber32> m_firstByteSeq; //!< Sequence number of the first byte in data (SND.UNA)
 
   std::pair <PacketList::const_iterator, SequenceNumber32> m_highestSack; //!< Highest SACK byte
-
+  bool m_NextSeg_rfc6675; //!< NextSeg function complies with RFC 6675 point 3
 };
 
 std::ostream & operator<< (std::ostream & os, TcpTxBuffer const & tcpTxBuf);


You have also it in the repo (but you should reset the branch and re-download it entirely)
Comment 11 natale.patriciello 2016-11-17 11:29:05 UTC
With the patch and the SACK true, the failing tests are the following:

    src/traffic-control/examples/adaptive-red-tests --testNumber=14
    src/traffic-control/examples/adaptive-red-tests --testNumber=15
    src/traffic-control/examples/red-vs-ared --queueDiscType=RED
Comment 12 Mohit 2016-11-23 23:39:29 UTC
Hi Natale,

Thanks for the description.

Sorry for the delay. I'll go over your proposed changes and get back by this weekend.

I also plan to address the concerns about dependency on TCP performance.
Comment 13 Artem Krasilov 2016-11-24 10:42:13 UTC
Created attachment 2683 [details]
proposed patch

I have tested TCP SACK code with an application which rarely sends short packets (e.g., telnet application) and found that TCP sender performs unnecessary retransmissions. Specifically, in Slow Start phase the application write 800 bytes in the socket. In pcap file I see that the TCP sender two times sends the same packet of size 800 bytes (i.e., do unnecessary retransmission). 

The problem is that NextSeg function two times returns the same sequence number to transmit: (i) it sends pending data, (ii) it performs retransmission according to rule 3 from RFC 6675. However, rule 3  from RFC 6675 (which describes Loss Recovery Algorithm) is only applicable in Fast Recovery phase, but not in Slow Start or CA phase.

The proposed patch which fix the problem is attached.
Comment 14 natale.patriciello 2016-11-24 11:11:14 UTC
(In reply to Artem Krasilov from comment #13)
> I have tested TCP SACK code with an application which rarely sends short
> packets (e.g., telnet application) and found that TCP sender performs
> unnecessary retransmissions. Specifically, in Slow Start phase the
> application write 800 bytes in the socket. In pcap file I see that the TCP
> sender two times sends the same packet of size 800 bytes (i.e., do
> unnecessary retransmission). 

Hi Artem,

it was the exact reason why some of Mohit tests failed. I then inserted the attribute to enable / disable this behaviour.

> 
> The problem is that NextSeg function two times returns the same sequence
> number to transmit: (i) it sends pending data, (ii) it performs
> retransmission according to rule 3 from RFC 6675. However, rule 3  from RFC
> 6675 (which describes Loss Recovery Algorithm) is only applicable in Fast
> Recovery phase, but not in Slow Start or CA phase.

Did you refer to point (3) of NextSeg function ? Because I didn't found any correlation with Fast Recovery. The text is the following:

"""
      (3) If the conditions for rules (1) and (2) fail, but there exists
          an unSACKed sequence number 'S3' that meets the criteria for
          detecting loss given in steps (1.a) and (1.b) above
          (specifically excluding step (1.c)), then one segment of up to
          SMSS octets starting with S3 SHOULD be returned.
"""

Thank you!
Comment 15 Artem Krasilov 2016-11-24 15:13:08 UTC
Hi Natale,

> Did you refer to point (3) of NextSeg function ? Because I didn't found any
> correlation with Fast Recovery. The text is the following:
> 
> """
>       (3) If the conditions for rules (1) and (2) fail, but there exists
>           an unSACKed sequence number 'S3' that meets the criteria for
>           detecting loss given in steps (1.a) and (1.b) above
>           (specifically excluding step (1.c)), then one segment of up to
>           SMSS octets starting with S3 SHOULD be returned.
> """
> 
> Thank you!

Rule (3) is a part of RFC 6675 which describes only(!) behaviour of TCP sender during Fast Recovery (this comes from its title  "A Conservative Loss Recovery Algorithm..." and Intro: "The algorithm outlined in this document, heavily based on the algorithm detailed in [FF96], is a conservative replacement of the fast recovery algorithm [Jac90][RFC5681]"). 

During Slow Start and Congestion Avoidance phases the TCP sender can only send new data segments.

NextSeg function in your code is called in all phases (Slow Start, Congestion Avoidance, Fast Recovery), so it should have different behaviour depending on the current phase. That is why in the proposed patch a have added a flag which indicate whether its a  Fast Recovery phase or not.

Regards,
Artem
Comment 16 Mohit 2016-12-03 13:51:25 UTC
Hi Natale, Artem,

I did some more testing on Natale's latest code on tcp-sack tree, and also by applying Artem's patch:

Case 1. SACK: disabled, m_NextSeg_rfc6675: false

- All tests pass (as expected).

Case 2. SACK: enabled, m_NextSeg_rfc6675: false

Following tests fail:
    src/traffic-control/examples/adaptive-red-tests --testNumber=14
    src/traffic-control/examples/adaptive-red-tests --testNumber=15
    src/traffic-control/examples/red-vs-ared --queueDiscType=RED

Case 3. SACK: enabled, m_NextSeg_rfc6675: true

Following tests fail:
    src/traffic-control/examples/adaptive-red-tests --testNumber=14
    src/traffic-control/examples/adaptive-red-tests --testNumber=15
    src/traffic-control/examples/red-vs-ared --queueDiscType=RED

I'm still working on these failed examples, and hope to resolve them shortly.

Although src/test/adaptive-red-queue-disc-test-suite.cc does not fail in any of the cases, I've modified it to be independent of the internet model. 

Please let me know your opinion on the patch uploaded here: https://codereview.appspot.com/311370043/

Thank you very much.
Comment 17 natale.patriciello 2016-12-08 05:24:07 UTC
I like the patch added by Artem; I've added it to my repository. When it would be safe to merge SACK ? I really like to push it, since it would relax a lot my list of bugs.
Comment 18 Tom Henderson 2016-12-08 10:45:19 UTC
(In reply to natale.patriciello from comment #17)
> I like the patch added by Artem; I've added it to my repository. When it
> would be safe to merge SACK ? I really like to push it, since it would relax
> a lot my list of bugs.

If you think it is now ready, let's last call (one week) a final patch proposal and merge next week if no issues.

Can you update and maintain this issue or another issue while we review?

https://codereview.appspot.com/283880043/

Or, can we review it on github?  I just would like a place to enter inline comments.
Comment 19 Tom Henderson 2016-12-08 10:48:01 UTC
.
> 
> Although src/test/adaptive-red-queue-disc-test-suite.cc does not fail in any
> of the cases, I've modified it to be independent of the internet model. 
> 
> Please let me know your opinion on the patch uploaded here:
> https://codereview.appspot.com/311370043/
> 
> Thank you very much.


I think we should merge the above if Stefano agrees.
Comment 20 Mohit 2016-12-08 13:09:06 UTC
Hi Nat,

I have updated the ARED examples which were failing. 

Since these are examples, I think it may not be a good idea to make them independent of TCP. So I have tried relaxing the constraints in these examples such that the dependency on TCP is minimized, and only the minimum functionality of qdisc is verified.

The patch is uploaded here for review: https://codereview.appspot.com/311390043/

Please let me know your views on this patch.

While testing the SACK code with above patch, also apply this patch: https://codereview.appspot.com/311370043/ which makes adaptive-red-queue-disc-test-suite.cc independent of the internet module.

Along with these patches, I tested your latest code on tcp-sack tree, and see that all tests pass successfully in both cases: SACK disabled and SACK enabled. 

P.S.:

Currently, the build fails for tcp-sack tree as Artem's patch modifies the signature of NextSeg method, but the same is not reflected in tcp-tx-buffer-test.cc

Hope it helps.
Comment 21 natale.patriciello 2016-12-08 15:25:43 UTC
(In reply to Tom Henderson from comment #18)
> (In reply to natale.patriciello from comment #17)
> > I like the patch added by Artem; I've added it to my repository. When it
> > would be safe to merge SACK ? I really like to push it, since it would relax
> > a lot my list of bugs.
> 
> If you think it is now ready, let's last call (one week) a final patch
> proposal and merge next week if no issues.
> 
> Can you update and maintain this issue or another issue while we review?
> 
> https://codereview.appspot.com/283880043/
> 
> Or, can we review it on github?  I just would like a place to enter inline
> comments.

Issue updated.

Mohit a last question .. why do you use std::cout instead of NS_LOG statement?
Thank you!
Comment 22 Mohit 2016-12-10 09:25:20 UTC
Tom Sir suggested once that it is better to use std::cout if we always want to print out, because NS_LOG_UNCOND does not display in optimized builds.

Thanks!
Comment 23 natale.patriciello 2017-02-03 08:05:36 UTC
Merged in the series 12619:7dd5042ebb2e - 12651:4fe9e782a0f6. Thank you all for the time spent reviewing.
Comment 24 natale.patriciello 2017-02-14 11:24:36 UTC
(In reply to Artem Krasilov from comment #13)
> Created attachment 2683 [details]
> proposed patch
> 
> I have tested TCP SACK code with an application which rarely sends short
> packets (e.g., telnet application) and found that TCP sender performs
> unnecessary retransmissions. Specifically, in Slow Start phase the
> application write 800 bytes in the socket. In pcap file I see that the TCP
> sender two times sends the same packet of size 800 bytes (i.e., do
> unnecessary retransmission). 
> 
> The problem is that NextSeg function two times returns the same sequence
> number to transmit: (i) it sends pending data, (ii) it performs
> retransmission according to rule 3 from RFC 6675. However, rule 3  from RFC
> 6675 (which describes Loss Recovery Algorithm) is only applicable in Fast
> Recovery phase, but not in Slow Start or CA phase.
> 
> The proposed patch which fix the problem is attached.

Hi Artem,

a question: do you think that the rule 3 applies also in CA_DISORDER and CA_LOSS state? I had some tests in which at the end of the transmission, in CA_DISORDER, some data could be transmitted but since the DISORDER state the rule 3 is not invoked.

thanks
Comment 25 Artem Krasilov 2017-02-22 09:30:12 UTC
> Hi Artem,
> 
> a question: do you think that the rule 3 applies also in CA_DISORDER and
> CA_LOSS state? I had some tests in which at the end of the transmission, in
> CA_DISORDER, some data could be transmitted but since the DISORDER state the
> rule 3 is not invoked.
> 
> thanks

Hi Natale,

My understanding is that NextSeg () function should be invoked only in in the loss recovery phase (i.e., in CA_RECOVERY state after reception of DupThresh duplicate ACKs). In particular, RFC 6675 says: "specifically NextSeg () is inappropriate during loss recovery after an RTO", i.e., inappropriate in CA_LOSS state. 

Regards,
Artem