Bugzilla – Bug 902
TCP: handle out-of-order packets during connection shutdown
Last modified: 2010-06-02 11:29:33 UTC
Created attachment 855 [details] TCPdump where retransmits occur and closing does not happen properly This may be a regression of bug #818. Attaching two packet captures: One where retransmission occurs, one where it does not. Packets in tcp-fin-retransmit.pcap: Server sends packets #48, #49, #51, #52 and #53. All contain data, #53 has FIN flag set. Client keeps ACKing data from the server until packet #58 occurs, where there are three acks with number 44441. Server restarts sending data (Packet #61). Client however, did receive all packets after all and acknowledges them (packet #62). AT THIS POINT, NofifyClose() should be called so that client-app would find out that peer has closed connection. It does not get called, so the connection only closes as the application is.
Created attachment 856 [details] Packet capture where there are no retransmissions, closing happens properly.
Created attachment 857 [details] Pcap from recipient side Ok, looks like I had been looking at the packet capture from wrong end. This is from the recipient side. It has, as you can see, one less packet than the other side - packet with sequence number 44441 in fact DOES get lost on the way. All the other packets make it through. So, when the recipient sends three dupacks for that sequence and server responds with it, the entire transmission can then be acknowledged. NotifyClose() simply does not get called at this point. Trying to debug some more..
Ok, I think I have found the issue, took some gdbing.. Anyway, weird stuff: When the FIN is first received, execution goes through ProcessPacketAction's case PEER_CLOSE. Branch if (tcpHeader.GetSequenceNumber () != m_nextRxSequence) gets executed and NewRx gets called to take in the data in the FIN packet. At this point, m_state is ns3::CLOSE_WAIT (!) - shouldn't really be that yet, as we have missing data! Anyway, in NewRx the "Case 2" gets executed since there's stuff missing in between. (Debugging this even with breakpoint condition THIS= something is kinda hard as it appears that functions are shared between instances...) Anyway, here's what logs tell me: FIN is received: 31.1946s 31.1946 [node 33] TcpSocketImpl:ProcessPacketAction(): TcpSocketImpl 0x9c6b4d8 setting pendingClose rxseq 48641 nextRxSeq 44441 31.1946s 31.1946 [node 33] TcpSocketImpl:ProcessPacketAction(): TcpSocketImpl 0x9c6b4d8 setting pendingClose rxseq 48641 nextRxSeq 44441 31.1946s 31.1946 [node 33] TcpSocketImpl:NewRx(0x9c6b4d8, 0x9dc0e38, tcpHeader, 05-06-c0:a8:01:04:20:03) 31.1946s 31.1946 [node 33] TcpSocketImpl:NewRx(): TcpSocketImpl 0x9c6b4d8 NewRx, seq 48641 ack 9 p.size is 1360 31.1946s 31.1946 [node 33] TcpSocketImpl:NewRx(): Case 2, buffering 48641 This part is ok.. However, what happens when the new packet comes in: 32.0171s 32.0171 [node 33] TcpSocketImpl:NewRx(): Case 1, advanced nrxs to 45841 ..buffer gets emptied (since I have a HandleRead() type callback attached to NotifyDataRecv()) 32.0171s 32.0171 [node 33] TcpSocketImpl:RecvFrom(0x8f1c4d8, 4294967295, 0) 32.0171s 32.0171 [node 33] TcpSocketImpl:Recv() 32.0171s 32.0171 [node 33] TcpSocketImpl:Recv(): TCP 0x8f1c4d8 bufferedData.size() 4 time 32017102913ns 32.0171s 32.0171 [node 33] TcpSocketImpl:Recv(): TCP 0x8f1c4d8 bufferedData.size() 3 time 32017102913ns 32.0171s 32.0171 [node 33] TcpSocketImpl:Recv(): TCP 0x8f1c4d8 bufferedData.size() 2 time 32017102913ns 32.0171s 32.0171 [node 33] TcpSocketImpl:Recv(): TCP 0x8f1c4d8 bufferedData.size() 1 time 32017102913ns 32.0171s 32.0171 [node 33] TcpSocketImpl:RecvFrom(0x8f1c4d8, 4294967295, 0) 32.0171s 32.0171 [node 33] TcpSocketImpl:Recv() However, it seems that rxseq does NOT get set to 50001 so that PeerClose would actually complete: 32.0171s 32.0171 [node 33] TcpSocketImpl:NewRx(): TcpSocketImpl 0x8f1c4d8 adv rxseq by 1400 32.0171s 32.0171 [node 33] TcpSocketImpl:ProcessPacketAction(0x8f1c4d8, 12, 0x8f48430, 05-06-c0:a8:01:04:20:03) 32.0171s 32.0171 [node 33] TcpSocketImpl:ProcessPacketAction(): Got Peer Close 32.0171s 32.0171 [node 33] TcpSocketImpl:ProcessPacketAction(): TcpSocketImpl 0x8f1c4d8 setting pendingClose rxseq 44441 nextRxSeq 50001 An ACK for all traffic does get sent in the end: 32.0171s 32.0171 [node 33] TcpSocketImpl:NewRx(0x8f1c4d8, 0x8f48430, tcpHeader , 05-06-c0:a8:01:04:20:03) 32.0171s 32.0171 [node 33] TcpSocketImpl:NewRx(): TcpSocketImpl 0x8f1c4d8 NewRx, seq 44441 ack 9 p.size is 1400 32.0171s 32.0171 [node 33] TcpSocketImpl:NewRx(): TCP 0x8f1c4d8 got seq 44441 expected 50001 flags 32.0171s 32.0171 [node 33] TcpSocketImpl:SendEmptyPacket(0x8f1c4d8, 16) So, I think crux here is that what gets sent to ProcessPacketAction (PEER_CLOSE, p, tcpHeader, fromAddress, toAddress); is the ORIGINAL packet (with seq no 44441), when it SHOULD get the last contiguous packet out of the buffer, instead (with seq no 50001). After all, RxBufFinishInsert (tcpHeader.GetSequenceNumber ()); checks if stuff has been received, but it still sends the original onwards... Unfortunate thing is that I don't think *headers* are buffered! A simple modification I thought was that RxBufFinishInsert would return the last contiguous packet back - but it's just the payload. Also, just modifying tcpheader can't be done, it's a const (and would probably break other stuff). Only thing that comes to mind right now is that I should actually create a new variable "m_finseqnumber" that gets initialized when the FIN is first received. I'll try if I can come up with a patch...
Created attachment 858 [details] Patch to fix the issue I think this patch fixes the issue. It adds a new variable m_finSequence that gets initialized when FIN packet is first received. When calling PEER_CLOSE again, the nextRxSequence is compared against THAT instead of header of whatever packet happened to arrive. Note that the line numbers in the patch are probably off as I have added some custom traces for my own purposes.
Thank you very much for the detailed report! I think I now see this problem, and I agree with you. I'm curious, if the FIN packet comes out of order and m_pendingClose is set, will the current code work in any case ever? It seems like in NewRx, when ProcessPacketAction (PEER_CLOSE...) gets called, there is really no way that this could contain the packet with the correct sequence number for PEER_CLOSE to complete, unless somehow it was the FIN packet that is retransmitted which I don't think could happen given that it is already received successfully. I guess I'm just curious, because I would like to understand how the fix for bug 818 may have caused this. I'm also curious why RST is sent in the first packet capture with no retransmissions, but that is a bit off topic for this bug.
(In reply to comment #5) > received successfully. I guess I'm just curious, because I would like to > understand how the fix for bug 818 may have caused this. Now that I actually understand what happens I take back the idea that it might have been a regression - probably never was - I just didn't notice it before since the missing ACK flags were confusing things anyway. > I'm also curious why RST is sent in the first packet capture with no > retransmissions, but that is a bit off topic for this bug. Oh drat. That might be something else. I guess we have stumbled on yet another bug - this time at server end. I checked my entire pcap, and noticed that on my batch of 30 tcp streams, searching in wireshark with tcp.flags.reset == 1, I get about 3 flows that end up with RST. Something is going wrong in the sense that the client does not expect the last ACK from server in this scenario. Furthermore, in some cases the client sends a yet another ACK to the "last ack" (yet this doesn't result in an RST from server!). Yet in bulk of cases everything works. Sigh...time to file yet another bug I guess. However, this is not *so* critical as it does not affect closing - just that an extra packet (RST or extra ACK) gets sent - but hardly a correct operation.
Ok - take a quick look at "no retransmissions" pcap again. Note the acknowledgement number and sequence numbers on the last ack packet. It's 30001, when it should be 30002. Looks like sequence number is not increased properly when the FIN_WAIT_1 is entered and the first FIN gets sent.
Created attachment 860 [details] Packet capture post-patch. ...and yet another - looks like I'm now causing an RST even in the case that I supposedly fixed in my patch - attaching yet another capture. This shows that the server is yet again sending the last ack with WAY off sequence number (it retreats back to 44441). So I guess a more thorough patch is needed to handle the last ack in server end too...
I think the resets are caused by stuff referred to in bug #426, basically in if (!m_bufferedData.empty()) { SendRST(); return 0; } However, if you have received a FIN and all missing packets, then the buffered, incoming data SHOULD be empty by the time you get to the callback...right?
Ok, I have managed to get things a bit better by also saving the FIN sequence number on the server side when it first is created. Now I'm only having issue with "ACK for the last ACK". It's as if sequence number counters are not increased to account for the FIN...
Created attachment 862 [details] Patch which updates server side workings as well. Work so far. Still running into the "duplicate FIN/ACK + ACK" issue, and occasional RSTs. Attaching yet one more packet capture which illustrates remaining issues. Granted, the bug description should at this point be turned to "Bunch of issues in case of out-of-order packets during connection shutdown"...
Created attachment 863 [details] Packet capture illustrating remaining issues In this case, packet #62 should be either not sent at all or sent before the FIN/ACK sequence. Furthermore, it should have it's ACK at 50001, e.g. acknowledging all data but not the FIN bit. Also, the RST appears here too. My debugs seem to indicate that the extra ACK is triggered by ReTxTimeout. I haven't been able to stop it from appearing (tried to set checks for m_state == LAST_ACK to all sendpacket functions and a few other things)... However, see next two packet captures which are from SAME simulation!
Created attachment 864 [details] Packet capture where RST occurs but "bonus" ack is ok Here, the "bonus" ACK is sent before the FIN/ACK. RST still occurs.
Created attachment 865 [details] Finally, last capture where everything is working ok, and again in the same simulation!
(In reply to comment #14) > Created an attachment (id=865) [details] > Finally, last capture where everything is working ok, and again in the same > simulation! I was out for a bit. I plan to start looking at this again very soon. Thank you for the detailed reports again.
(In reply to comment #11) > Created an attachment (id=862) [details] > Patch which updates server side workings as well. > > Work so far. Still running into the "duplicate FIN/ACK + ACK" issue, and > occasional RSTs. > > Attaching yet one more packet capture which illustrates remaining issues. > > Granted, the bug description should at this point be turned to "Bunch of issues > in case of out-of-order packets during connection shutdown"... I am mostly OK with this patch. Two quick questions: 1) Why do you change m_nextTxSequence in ProcessPacketAction during ACK_TX given a FIN packet? I am concerned about modifying this member variable directly (ie, not just incrementing it some number of bytes). 2) In ProcessPacketAction, PEER_CLOSE, you have added an extra conditional to check the sequence number of the FIN packet against m_nextRxSequence. Rather than adding another conditional, could you not just replace the one that is already there "if (tcpHeader.GetSequenceNumber () != m_nextRxSequence)" with the one you have created?
(In reply to comment #16) > 1) Why do you change m_nextTxSequence in ProcessPacketAction during ACK_TX > given a FIN packet? I am concerned about modifying this member variable > directly (ie, not just incrementing it some number of bytes). Same reason, but this time at the other end, as with PEER_CLOSE - saving the sequence number of FIN works around problems with retransmissions. See http://www.nsnam.org/bugzilla/attachment.cgi?id=860 how things go wrong without this.. > 2) In ProcessPacketAction, PEER_CLOSE, you have added an extra conditional to > check the sequence number of the FIN packet against m_nextRxSequence. Rather > than adding another conditional, could you not just replace the one that is > already there "if (tcpHeader.GetSequenceNumber () != m_nextRxSequence)" with > the one you have created? The extra conditional basically checks if m_finsequence has been initialized...and I *think* you are right about this.
Created attachment 872 [details] Patch with coding style update Ok, after finally understanding the last issue here, I am happy with the patch from Antti. I've just uploaded the same patch with some minor coding style changes. I've asked Antti if he has any more comments/thoughts about this one before I push.
(In reply to comment #18) > Created an attachment (id=872) [details] > Patch with coding style update > > Ok, after finally understanding the last issue here, I am happy with the patch > from Antti. I've just uploaded the same patch with some minor coding style > changes. I've asked Antti if he has any more comments/thoughts about this one > before I push. No further comments. The odd RSTs also happen with NSC so I don't think that's an issue.
Actually, in tcp-socket-impl.h, the comment on line 205 should probably read //sequence number where fin was sent or received since we are now tracking both ends :)
Created attachment 873 [details] Patch with update
change title from TCP socket does not properly notify application if there's a retransmit during FIN_WAIT
(In reply to comment #21) > Created an attachment (id=873) [details] > Patch with update Looks good but would be nice to see a testcase to avoid regressions. See what Craig did for the src/test/ns3tcp/ns3tcp-interop-test-suite with the pcap trace vectors, as an example of how you can hook a trace source and save just those headers you care about, and compare result vs. expected. Even pcap may be overkill, if you just want to store a vector of TCP sequence numbers received, for example.
(In reply to comment #23) > (In reply to comment #21) > > Created an attachment (id=873) [details] [details] > > Patch with update > > Looks good but would be nice to see a testcase to avoid regressions. > > See what Craig did for the src/test/ns3tcp/ns3tcp-interop-test-suite with the > pcap trace vectors, as an example of how you can hook a trace source and save > just those headers you care about, and compare result vs. expected. Problem is that I ran into this more or less by accident in a bigger simulation and I'm not really sure I can easily convert it to a simple test case. Theoretically the situation should be repeatable (sink/client sending dupacks right after FIN has been sent by the server, causing identical situation) by setting up a ton of tcp-large-transfer-esque applications so that the TCP streams collide due to congestion and hope that at least one of the streams then has this issue. I only noticed this when I was doing a load of 10 streams side by side and only one of them had this happen...but there was also lots of other stuff going on, so I'm not sure if just setting up two nodes will cut it.
I think there is a simple test case that could capture the behavior at least from the sender perspective. Right now, in src/test/ns3tcp/ns3tcp-loss-test-suite.cc second test case, the data packet immediately before the FIN is dropped. This should cause the behavior seen (ie, FIN gets buffered but when the previous packet comes in and it won't PEER_CLOSE correctly). I haven't looked in detail, but I should be able to turn this in to a test case as Tom has suggested by capturing the tcp headers during this run and making a test vector out of them.
While starting on a test case and going through the pcaps, I've found another issue after applying this patch. For example, say we are in a state where first FIN has been sent already but a previous data packet has been lost. When the data packet gets retransmitted and received, NewRx will eventually get called. NewRx will eventually call ProcessPacketAction(PEER_CLOSE...). The problem is both NewRx and the call to ProcessPacketAction(PEER_CLOSE...) will eventually make a call to SendEmptyPacket(TcpHeader::ACK). So after the re-transmitted data packet is received, the receiver will send 2 ACKs, one good one and one that will show up as a dup ack. I haven't thought of a good way to stop this and ensure that it still works for all other scenarios. For instance, I don't think we can just return after the call to ProcessPacketAction(PEER_CLOSE...) in NewRx. Then again, maybe we can. I need to think about it more. Open to thoughts/suggestions.
(In reply to comment #26) > While starting on a test case and going through the pcaps, I've found another > issue after applying this patch. For example, say we are in a state where > first FIN has been sent already but a previous data packet has been lost. When > the data packet gets retransmitted and received, NewRx will eventually get > called. NewRx will eventually call ProcessPacketAction(PEER_CLOSE...). The Ok, this explains the bonus acks I myself observed in there (see comment #12). I wasn't sure what caused it.
(In reply to comment #27) > (In reply to comment #26) > > While starting on a test case and going through the pcaps, I've found another > > issue after applying this patch. For example, say we are in a state where > > first FIN has been sent already but a previous data packet has been lost. When > > the data packet gets retransmitted and received, NewRx will eventually get > > called. NewRx will eventually call ProcessPacketAction(PEER_CLOSE...). The > > Ok, this explains the bonus acks I myself observed in there (see comment > #12). I wasn't sure what caused it. FYI, my understanding of the problem wasn't exactly right. It looks like the issue had to do with this conditional in PEER_CLOSE: if (p->GetSize () != 0) { NewRx (p, tcpHeader, fromAddress, toAddress); } Specifically, this was a problem when FIN has been received out of sequence, and the retransmitted data comes in which will call NewRx and then PEER_CLOSE again. It goes through PEER_CLOSE and will call NewRx again because of the above conditional. We don't want to call NewRx again in this case. Looks like we should be able to stop this if we just check to see if m_finSequence has been initialized: if (p->GetSize () != 0) { if (m_finSequence != m_nextRxSequence) { NewRx (p, tcpHeader, fromAddress, toAddress); } } I will add this to the patch, along with the test case that I hope to finish up tomorrow.
Created attachment 877 [details] Patch with dup ack fix Here is final (I hope) patch. Test case coming up.
Created attachment 878 [details] Test case I just modified the two test cases in src/test/ns3tcp/ns3tcp-loss-test-suite.cc to use the saved header method following the example from ns3tcp-interop-test-suite.cc. The second test case (the one which drops the data packet causing FIN out-of-order) will fail without the above patch applied. This also assumes that you have run the test cases at least once with WRITE_VECTORS set to true, which I will do before committing of course.
changeset f03f4d82daaf and b08fe828b686 (test case)