Bug 818

Summary: TCP Socket implementation does not set ACK flag on retransmits
Product: ns-3 Reporter: Antti Mäkelä <antti.makela>
Component: internetAssignee: Josh Pelkey <jpelkey>
Status: RESOLVED FIXED    
Severity: critical CC: jpelkey, ns-bugs, tomh
Priority: P2    
Version: pre-release   
Hardware: All   
OS: All   
Attachments: hack patch
Fix by adding new event
Packet capture which shows retransmit failing for FIN

Description Antti Mäkelä 2010-02-17 06:46:05 UTC
If I have a path that breaks down, an ESTABLISHED TCP session and TCP times out and starts retransmitting packets, ACK flag is not set on retransmissions. This is shown on Wireshark. However, what's odd is that the peer seems to be able to handle these broken flags just fine and eventually sends an ACK as well.

Only after first ACK is received from the peer, the flags return to correct operation.

The broken line seems to be in tcp-socket-impl.cc, void TcpSocketImpl::Retransmit (),

  uint8_t flags = TcpHeader::NONE;

I'd think this should be TcpHeader::ACK, since SYN_SENT state is handled separately as is FIN_WAIT_1 and doesn't use the variable 'flags'. This also works for the m_closeonEmpty case where ACK should be set.
Comment 1 Josh Pelkey 2010-02-17 12:21:19 UTC
(In reply to comment #0)
> If I have a path that breaks down, an ESTABLISHED TCP session and TCP times out
> and starts retransmitting packets, ACK flag is not set on retransmissions. This
> is shown on Wireshark. However, what's odd is that the peer seems to be able to
> handle these broken flags just fine and eventually sends an ACK as well.
> 
> Only after first ACK is received from the peer, the flags return to correct
> operation.
> 
> The broken line seems to be in tcp-socket-impl.cc, void
> TcpSocketImpl::Retransmit (),
> 
>   uint8_t flags = TcpHeader::NONE;
> 
> I'd think this should be TcpHeader::ACK, since SYN_SENT state is handled
> separately as is FIN_WAIT_1 and doesn't use the variable 'flags'. This also
> works for the m_closeonEmpty case where ACK should be set.

There was an old bug that was supposed to address a very similar problem, bug 499.  I guess maybe the retrans case wasn't tested. Unfortunately, simply changing the line in Retransmit to TcpHeader::ACK is causing the tcp test suite to fail.  So, we'll have to figure out what's going on there first.
Comment 2 Antti Mäkelä 2010-02-23 03:09:19 UTC
(In reply to comment #1)
> There was an old bug that was supposed to address a very similar problem, bug
> 499.  I guess maybe the retrans case wasn't tested. Unfortunately, simply
> changing the line in Retransmit to TcpHeader::ACK is causing the tcp test suite
> to fail.  So, we'll have to figure out what's going on there first.

  Have to say that if the TCP test suite didn't detect the behavior on retransmits before then something should be done about the suite as well :)
Comment 3 Josh Pelkey 2010-02-23 15:14:21 UTC
Some more info on this one.  As stated above, after changing TcpHeader::NONE to TcpHeader::ACK in TcpSocketImpl::Retransmit, the tcp test suite fails.  Specifically, the test that fails in src/internet-stack/tcp-test.cc is the last one: AddTestCase (new TcpTestCase (100000, 100, 50, 100, 20)); This isn't surprising because the other two don't cause any retransmissions according to the logs.

Here are the last few lines of logging before the failure:
...
0s MapScheduler:RemoveNext(0x12b87b0)
0s MapScheduler:RemoveNext(): 0x12b87b00x140b57006708
0s DefaultSimulatorImpl:ProcessOneEvent(): handle 0
Command ['/home/jpelkey/repos/ns-3-dev/build/debug/utils/test-runner', '--basedir=/home/jpelkey/repos/ns-3-dev', '--suite=tcp'] exited with code -11

Now a backtrace from gdb should hopefully help us more:
#0  0x0000000000000000 in ?? ()

#1  0x00007ffff733ecf1 in ns3::Callback<void, ns3::Ptr<ns3::Packet>, ns3::Ipv4Address, ns3::Ipv4Address, unsigned short, ns3::empty, ns3::empty, ns3::empty, ns3::empty, ns3::empty>::operator() (this=0x6b2a98, a1=..., a2=..., a3=..., a4=50000) at debug/ns3/callback.h:419

#2  0x00007ffff733e27a in ns3::Ipv4EndPoint::DoForwardUp (this=0x6b2a80, p=..., saddr=..., daddr=..., sport=50000)
    at ../src/internet-stack/ipv4-end-point.cc:119

#3  0x00007ffff733ea5f in Notify (this=0x7c09b0) at debug/ns3/make-event.h:197

#4  0x00007ffff716f460 in ns3::EventImpl::Invoke (this=0x7c09b0) at ../src/simulator/event-impl.cc:37

#5  0x00007ffff718c350 in ns3::DefaultSimulatorImpl::ProcessOneEvent (this=0x6246e0) at ../src/simulator/default-simulator-impl.cc:121

#6  0x00007ffff718c4f2 in ns3::DefaultSimulatorImpl::Run (this=0x6246e0) at ../src/simulator/default-simulator-impl.cc:151

#7  0x00007ffff716ff41 in ns3::Simulator::Run () at ../src/simulator/simulator.cc:173

#8  0x00007ffff730a0ab in ns3::TcpTestCase::DoRun (this=0x61e000) at ../src/internet-stack/tcp-test.cc:146

#9  0x00007ffff705c691 in ns3::TestCase::Run (this=0x61e000) at ../src/core/test.cc:152

#10 0x00007ffff705e8d1 in ns3::TestSuite::DoRun (this=0x7ffff7ddb9e0) at ../src/core/test.cc:684

#11 0x00007ffff705dfd5 in ns3::TestSuite::Run (this=0x7ffff7ddb9e0) at ../src/core/test.cc:459

#12 0x00000000004026ec in main (argc=3, argv=0x7fffffffe4b8) at ../utils/test-runner.cc:263

You can see the progression from ProcessOneEvent (last line in the above log) to the eventual failure from the backtrace.  Unfortunately, it looks like this goes to a callback, which still scare me :)  So I'm stuck for now, but I'll keep looking.
Comment 4 Antti Mäkelä 2010-02-24 11:16:38 UTC
(In reply to comment #3)
> You can see the progression from ProcessOneEvent (last line in the above log)
> to the eventual failure from the backtrace.  Unfortunately, it looks like this
> goes to a callback, which still scare me :)  So I'm stuck for now, but I'll
> keep looking.

  I'd say that since the callbacks are set just a few lines earlier, lines

  server->SetAcceptCallback (MakeNullCallback<bool, Ptr< Socket >, const Addres$
                             MakeCallback(&TcpTestCase::ServerHandleConnectionC$

  source->SetRecvCallback (MakeCallback(&TcpTestCase::SourceHandleRecv, this));
  source->SetSendCallback (MakeCallback (&TcpTestCase::SourceHandleSend, this));

  Try setting a breakpoint to those routines (lines tcptestcase.cc:197 and tcptestcase.cc:217). 

  However, it seems as if one of them callbacks has been set to zero so eventually the execution attempts to call a function in a null pointer. Which is kinda weird.
Comment 5 Josh Pelkey 2010-03-21 16:36:12 UTC
I stared at this some more today.  Based on the logs, it looks like the connection is getting torn down (endpoint is deallocated) but some more packets (retransmissions?) are coming in after it.  When they try to get forwarded up, things go wrong.  I can't believe that simply setting the ACK flag during packet retransmit is causing all this, but that's my best guess so far.  So possibly something is incorrect with the state machine or the connection teardown process.  I also tried only setting the ACK flag when in the ESTABLISHED state, but I get the same result.
Comment 6 Josh Pelkey 2010-03-23 14:15:34 UTC
After looking at this some more today, I think I understand why changing Retransmit to set ACK flag is causing some strange things to happen.  It looks like when the tcp connection is in one of the close states (ex: last ack) it handles packets differently if they have no flags or if they have ACK flag.  So, before, when a Retransmit packet came in with no flags, it was handled by NEW_SEQ_RX action.  However, when ACK flag is set, it thinks its the last ack, and closes the connection.  So I believe that this has just uncovered another bug, which is the fact that either TCP connection termination and retransmission is handled incorrectly.  I don't think that a tcp socket should be getting data retransmissions when in some of the close states (ex: last ack).  I need to look at this more, and relearn the tcp termination behavior.  Mostly, I just typed this up for my reference later.
Comment 7 Antti Mäkelä 2010-03-24 06:41:29 UTC
(In reply to comment #6)
> be getting data retransmissions when in some of the close states (ex: last

  Well, states where you can still receive data are ones where the peer is in CLOSE_WAIT (e.g. the peer's app can still send stuff until it calls Close()). So this includes FIN_WAIT_1 and 2, but none of the others (CLOSING, TIME_WAIT and LAST_ACK).
Comment 8 Josh Pelkey 2010-03-24 10:56:23 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > be getting data retransmissions when in some of the close states (ex: last
> 
>   Well, states where you can still receive data are ones where the peer is in
> CLOSE_WAIT (e.g. the peer's app can still send stuff until it calls Close()).
> So this includes FIN_WAIT_1 and 2, but none of the others (CLOSING, TIME_WAIT
> and LAST_ACK).

That helps.  Let me ask this also.  Imagine both sockets in ESTABLISHED state and the initiator is done, so it tries to close.  If there is no pending data left to send, then it will just send a FIN and move to FIN_WAIT_1.  What happens if there is still pending data?  Does it, a) just transmit all pending data and then send FIN, or b) transmit all pending data and wait to make sure it is all acked before sending FIN?

The reason I ask is because I think we do (a) now.  For example, on close, all pending data is sent, then FIN is sent, so the server moves to CLOSE_WAIT.  Now when the server is ready to close, it moves to LAST_ACK.  To me, it seems like this is a problem because unacked data from the initiator could still be out there.
Comment 9 Josh Pelkey 2010-03-24 11:11:26 UTC
> That helps.  Let me ask this also.  Imagine both sockets in ESTABLISHED state
> and the initiator is done, so it tries to close.  If there is no pending data
> left to send, then it will just send a FIN and move to FIN_WAIT_1.  What
> happens if there is still pending data?  Does it, a) just transmit all pending
> data and then send FIN, or b) transmit all pending data and wait to make sure
> it is all acked before sending FIN?
> 
> The reason I ask is because I think we do (a) now.  For example, on close, all
> pending data is sent, then FIN is sent, so the server moves to CLOSE_WAIT.  Now
> when the server is ready to close, it moves to LAST_ACK.  To me, it seems like
> this is a problem because unacked data from the initiator could still be out
> there.

I think I just found the answer to my question in the RFC:

Case 1:  Local user initiates the close

    In this case, a FIN segment can be constructed and placed on the
    outgoing segment queue.  No further SENDs from the user will be
    accepted by the TCP, and it enters the FIN-WAIT-1 state.  RECEIVEs
    are allowed in this state.  All segments preceding and including FIN
    will be retransmitted until acknowledged

So I guess (a) is right.  Perhaps the server app is closing before all of the data can get retransmitted, and we don't handle this case correctly.  Kind of feel like I'm running in circles here :/  Will continue to think about this.
Comment 10 Antti Mäkelä 2010-03-24 12:40:27 UTC
(In reply to comment #8)
>The reason I ask is because I think we do (a) now.  For example, on close, all
>pending data is sent, then FIN is sent, so the server moves to CLOSE_WAIT.  Now
>when the server is ready to close, it moves to LAST_ACK.  To me, it seems like
>this is a problem because unacked data from the initiator could still be out
>there.

  Yes, (a) is correct. And no, this shouldn't be a problem - the answer is right there in RFC, from sender perspective treat the FIN-packet just like it's just a flag in the last packet at the end of send queue.

  Suppose you send five packets, all that fit in a window, and call close(). Thus the fifth packet becomes FIN. What happens at peer is that

  if all packets come in, you simply respond with a ACKs as appropriate and at reception of FIN, move to CLOSE_WAIT.

  if you miss packet #3 and it doesn't arrive within the timeout, you respond with an ACK up to packet #2, and ignore packets #4 and FIN (and no, I'm not sure what happens if Selective ACK is in use, but we don't have this right now anyway). Then the sender re-transmits packets #3, #4 and FIN. Proceed as above.

  if packets come out of order, maybe even the FIN first, you proceed as you would in standard situation and buffer them - if they all come within the timeout all is golden.

>So I guess (a) is right.  Perhaps the server app is closing before all of the
>data can get retransmitted, and we don't handle this case correctly.  Kind of
>feel like I'm running in circles here :/  Will continue to think about this.

Retransmit() seems to handle both cases of lone, empty FIN getting lost with

  if (!m_pendingData)
    {
      if (m_state == FIN_WAIT_1)
        { // Must have lost FIN, re-send
          SendEmptyPacket (TcpHeader::FIN);
        }
      return;
    }

and the case of FIN flag in the last data segment:

  if (m_closeOnEmpty && remainingData == 0)
    { // Add the FIN flag
      flags = flags | TcpHeader::FIN;
    }

Anyway, I can't for the life of me find where in tcp-socket-impl.cc you actually check for RECEIVED FIN flag. How do m_pendingClose and ProcessPacketAction (PEER_CLOSE) actually work?
Comment 11 Josh Pelkey 2010-03-24 13:12:26 UTC
> Anyway, I can't for the life of me find where in tcp-socket-impl.cc you
> actually check for RECEIVED FIN flag. How do m_pendingClose and
> ProcessPacketAction (PEER_CLOSE) actually work?

Maybe this doesn't answer your question directly, but this is how I understand the tcp code to work:  

1) TcpSocketImpl::ForwardUp is called when packet received
2) The event is checked based on the flags of the packet, this uses the event definitions in tcp-l4-protocol.cc.
3) ProcessEvent is called and based on the event and current state (m_state), the state machine is checked to see which state to transition to and which action to take.
4) ProcessEvent changes the state, returns action, and ProcessPacketAction is called with that action

So for the case of checking received FIN flag, SimulationSingleton<TcpStateMachine>::Get ()->FlagsEvent (tcpHeader.GetFlags ()) is called to see if FIN_RX or FIN_ACK_RX is set.  Based on this event, ProcessEvent is called and SimulationSingleton<TcpStateMachine>::Get ()->Lookup (m_state,e) is used to find the next state and action to take (PEER_CLOSE if in ESTABLISHED).  FYI, the state machine definition is in tcp-l4-protocol.cc.

Maybe the issue is that even if the FIN packet is received out of order, by the time we've entered ProcessPacketAction (PEER_CLOSE), the state machine has already been updated in ProcessEvent.

As you can tell, I'm still trying to figure all this out myself as well.
Comment 12 Josh Pelkey 2010-03-25 11:12:02 UTC
Created attachment 800 [details]
hack patch

Ok, so here is a patch that enables ACK on retransmit and also passes the tcp test suite.  It is pretty much a hack, and I'm still thinking about what to do.  Basically, when in the LAST_ACK or FIN_WAIT_1 state, I check to make sure that if an ACK comes in that it really does correspond to the FIN we sent by checking the sequence number and comparing to m_nextRxSequence.  If they are equal, then I know this ACK is for the FIN.  Therefore I set a flag to show this, and when we go to process the ACK event, if in LAST_ACK or FIN_WAIT_1 and flag is set, then we can update the state.  If flag is not set, then it is ACK from something else, and I don't update the state.

So, there still seems to be an issue to me, if I understand the state transitions clearly (and I might not); where, in the LAST_ACK state, we can receive an ACK from anything other than the FIN.  Based on the above posts, it seems like once we are in LAST_ACK, we shouldn't get anything else but an ACK from the FIN.  However, this is not the case in the last tcp test case in tcp-test.cc.
Comment 13 Tom Henderson 2010-04-19 10:34:26 UTC
I think the main problem is this line (line 176) in the state machine:

  aT[FIN_WAIT_1][ACK_RX]      = SA (FIN_WAIT_2, NEW_ACK);

which says to transition from FIN_WAIT_1 to FIN_WAIT_2 upon receipt of an ACK.

I agree with Josh that the problem is that the mere act of receiving an ACK (that the ack flag is turned on) is not sufficient to move to FIN_WAIT_2; the event to transition should be that my FIN was acked.

So, one way to fix it is to work outside the L4 state machine to detect this (Josh's patch).   However, it may be cleaner to introduce a new event in tcp-typedefs.h Events_t such as "FIN_ACKED" and detect this event in the TcpSocketImpl class, and fix the state machine in TcpL4Protocol so that the transitions from state FIN_WAIT_1 to FIN_WAIT_2 and from CLOSING to TIME_WAIT are driven by this event.  I think that such a patch would be close to what Josh has already posted (although the above line 176 needs to be corrected to stay in FIN_WAIT_1).
Comment 14 Josh Pelkey 2010-04-19 11:42:39 UTC
Created attachment 835 [details]
Fix by adding new event

Here is what I've come up with given Tom's suggestions.  I will still be thinking about this throughout the day though, and I welcome and comments on the patch.  Note that for the test suite that was failing (tcp), the only change that was absolutely needed for the test to pass was the one to the LAST_ACK state.

Some thoughts:  Should we really be getting anything other than an ACK to a FIN in the LAST_ACK state?  If so, should NO_ACT action be applied, as I am doing now?  Same thought for CLOSING state.
Comment 15 Tom Henderson 2010-04-20 02:18:24 UTC
(In reply to comment #14)
> Created an attachment (id=835) [details]
> Fix by adding new event
> 
> Here is what I've come up with given Tom's suggestions.  I will still be
> thinking about this throughout the day though, and I welcome and comments on
> the patch.  Note that for the test suite that was failing (tcp), the only
> change that was absolutely needed for the test to pass was the one to the
> LAST_ACK state.

I'm OK with this patch as long as it passes tests for you.

> 
> Some thoughts:  Should we really be getting anything other than an ACK to a FIN
> in the LAST_ACK state?  If so, should NO_ACT action be applied, as I am doing
> now?  Same thought for CLOSING state.

In general, IP networks can drop, reorder, and duplicate packets, so I would be liberal in assuming what could arrive.  In LAST_ACK or CLOSING state, I imagine that retransmissions may still be needed in some data loss cases, but if you read through the peer's FIN, then you shouldn't get new data from the peer.
Comment 16 Josh Pelkey 2010-04-20 12:46:56 UTC
changeset ca79ce283a19

Thanks all!
Comment 17 Antti Mäkelä 2010-04-29 02:55:00 UTC
Sorry, but I guess I have to reopen this :)

I ran some tests, and it seems something is wrong still..

If I send stuff (basically using tcp-large-transfer), and Close() the socket, I send a FIN.

If I get an explicit request to retransmit (three DupAcks), the FIN does NOT get re-transmitted at the end.

Attaching a pcap file. I start transmitting at 30 seconds (Well, basically trying to do a simplified version of HTTP, I send a request to a server daemon specifying number of bytes, the server sends it back). 

At Packet 63 I do a StopApplication on the client side which explicitly calls Close and FIN and Last Ack finally get sent.
Comment 18 Antti Mäkelä 2010-04-29 02:55:46 UTC
Created attachment 852 [details]
Packet capture which shows retransmit failing for FIN
Comment 19 Antti Mäkelä 2010-04-29 03:15:23 UTC
Soo...basically: If the recipient first sends three dupacks, and then, when sender starts sending again, the recipient after all decides that he has received everything and sends an ACK - it should in fact be a FIN,ACK even if senders FIN,ACK was NOT retransmitted.

My "client" app has a CloseCallback which just says socket->Close() when peer closes (server has sent everything). Debugging a bit more, it does not get called at ALL. So I guess that's why client's FIN is only sent after app shuts down. 

So I guess we need to set needCloseNotify so that it is set when you ACK for the last bit of data and application actually gets a notification..
Comment 20 Josh Pelkey 2010-04-29 10:56:29 UTC
(In reply to comment #19)
> Soo...basically: If the recipient first sends three dupacks, and then, when
> sender starts sending again, the recipient after all decides that he has
> received everything and sends an ACK - it should in fact be a FIN,ACK even if
> senders FIN,ACK was NOT retransmitted.
> 
> My "client" app has a CloseCallback which just says socket->Close() when peer
> closes (server has sent everything). Debugging a bit more, it does not get
> called at ALL. So I guess that's why client's FIN is only sent after app shuts
> down. 
> 
> So I guess we need to set needCloseNotify so that it is set when you ACK for
> the last bit of data and application actually gets a notification..

I don't have much time to look at this just yet, but it seems like this is different than the original bug.  If "TCP socket implementation does not set ACK flag on retransmits" does not describe the issue, please close this one and open a new bug with a better description.  I will have a look at this a bit later.
Comment 21 Antti Mäkelä 2010-04-29 12:52:42 UTC
(In reply to comment #20)
> I don't have much time to look at this just yet, but it seems like this is
> different than the original bug.  If "TCP socket implementation does not set
> ACK flag on retransmits" does not describe the issue, please close this one and
> open a new bug with a better description.  I will have a look at this a bit
> later.

  I was thinking this might have been a regression since I don't remember seeing anything like this before. Anyway, I'll create a new bug in the morning.
Comment 22 Josh Pelkey 2010-04-29 13:28:51 UTC
(In reply to comment #21)
> (In reply to comment #20)
> > I don't have much time to look at this just yet, but it seems like this is
> > different than the original bug.  If "TCP socket implementation does not set
> > ACK flag on retransmits" does not describe the issue, please close this one and
> > open a new bug with a better description.  I will have a look at this a bit
> > later.
> 
>   I was thinking this might have been a regression since I don't remember
> seeing anything like this before. Anyway, I'll create a new bug in the morning.

Here is how I understand what is happening based on your description and the pcap.  Please let me know if I am incorrect.  The initiator calls Close, sends a FIN, and enters FIN-WAIT-1.  The responder sends some ACKs and a few DUPACKS, but does not yet ACK the FIN.  It does, however, enter the CLOSE-WAIT state.  It will sit here until its application is closed, which you do later in the simulation.  The initiator eventually resends the missing packet (which wasn't the FIN originally).  It does not set the FIN flag, as the FIN packet is already buffered.  Finally, when the server closes, it will send its own FIN, and the connection will be torn down after going through the last few states.  In all, this seems normal to me.
Comment 23 Antti Mäkelä 2010-04-29 14:17:22 UTC
(In reply to comment #22)
> Here is how I understand what is happening based on your description and the
> pcap.  Please let me know if I am incorrect.  The initiator calls Close, sends
> a FIN, and enters FIN-WAIT-1.  The responder sends some ACKs and a few DUPACKS,
> but does not yet ACK the FIN. 

  True so far. You see that client (responder in your terms) sends back dupacks with ACK around 44k in the dupack (total transfer is 50k) - thinking that packets have been lost on the way. The server (initiator in your terms) starts re-sending them packets, but before it gets to the end, the client sends a new ACK indicating that ALL data made it through after all.

> It does, however, enter the CLOSE-WAIT state. 

  I don't think it does. Shouldn't it call NotifyClose() anyway? I think NotifyClose() should be called at the same time as the responder sends that ACK indicating that everything has been transferred.  

> It will sit here until its application is closed, which you do later in the
> simulation. 

  Yes, and this happens because NotifyClose() never gets called and the application doesn't know that peer terminated session! 

  What happens now

Server (I don't have Wireshark on this computer so sequence numbers are from memory) sends following packets:

Data (44k), Data(46k), Data(48k), Data (50k, FIN set)

Client

Ack(44k), Ack(44k), Ack(44k)

Server notices three DupAcks, starts again with

Data(44k)

However, at this moment Client notices that subsequent packets, INCLUDING the FIN made it through after all, and sends

Ack(50k) - However, NotifyClose() is NOT called at this point! So the application never finds out that peer has sent a FIN and closed the connection from their end!

At this point the server stops sending any further data and just waits for FIN from the peer (as it should), however, the peer APPLICATION is not aware and never knows that server has closed...

If a TCP session does NOT send the dupacks after server has sent FIN, the NotifyClose() gets called properly right after it has been received and acknowledged, so application can immediately say socket->Close() and terminate both ends.

I can try taking a look at the code tomorrow myself but basically what happens is that if you receive a FIN-packet, NotifyClose() SHOULD be called when that FIN-packet gets acknowledged, no matter if there has been retransmissions on the way.

The unfortunate thing for debugging this is that it pretty much requires congestion. Right now I get this to appear by having 30 tcp sessions at once - and only in one of them these symptoms appear. I'm not yet that familiar with gdb - how can I set a breakpoint that only triggers with a specific instance of class (in this case, just the single TcpSocketImpl where this issue occurs, not the 29 others).
Comment 24 Antti Mäkelä 2010-04-30 01:39:44 UTC
Created bug #902.