Bugzilla – Bug 131
TCP implementation needs finite send buffer and callback handling
Last modified: 2008-06-09 11:49:01 UTC
In my simulation, I connect a TCP socket and then try to send data as fast as TCP allows. I register a "send callback", and in the callback I send more data. This causes TCP to call me again, recursively: #3321 0x00000000004135f0 in BenchNode::TcpClientSendCallback (this=0x62dfd0, socket=@0x7fffea70c140, dataSent=576) at ../utils/bench-olsr.cc:308 #3322 0x00000000004111ca in ns3::MemPtrCallbackImpl<BenchNode*, void (BenchNode::*)(ns3::Ptr<ns3::Socket>, unsigned int), void, ns3::Ptr<ns3::Socket>, unsigned int, ns3::empty, ns3::empty, ns3::empty, ns3::empty>::operator() (this=0x6344b0, a1=@0x7fffea70c190, a2=576) at debug/ns3/callback.h:205 #3323 0x00002b26c08d9874 in ns3::Callback<void, ns3::Ptr<ns3::Socket>, unsigned int, ns3::empty, ns3::empty, ns3::empty, ns3::empty>::operator() (this=0x6ba4a0, a1=@0x7fffea70c1e0, a2=576) at debug/ns3/callback.h:314 #3324 0x00002b26c08d81e1 in ns3::Socket::NotifyDataSent (this=0x6ba450, size=576) at ../src/node/socket.cc:198 #3325 0x00002b26c0918bef in ns3::TcpSocket::SendPendingData (this=0x6ba450, withAck=false) at ../src/internet-node/tcp-socket.cc:740 #3326 0x00002b26c091d1fe in ns3::TcpSocket::ProcessAction (this=0x6ba450, a=ns3::TX_DATA) at ../src/internet-node/tcp-socket.cc:525 #3327 0x00002b26c091f38a in ns3::TcpSocket::Send (this=0x6ba450, p=@0x7fffea70c4b0) at ../src/internet-node/tcp-socket.cc:273 #3328 0x00000000004135f0 in BenchNode::TcpClientSendCallback (this=0x62dfd0, socket=@0x7fffea70c510, dataSent=576) at ../utils/bench-olsr.cc:308 Some questions: 1. Is the "send callback" working correctly, i.e. are flow and congestion control already implemented? I would expect TCP to eventually stop calling me to send more data, for some time... 2. Shouldn't there be a Schedule somewhere to break this loop and avoid recursion? Even if recursion were not "infinite", it's not very good to abuse the call stack when it's easy to avoid via event schedule.
If you'd like to send TCP data "as fast as possible" you just need write all of the data into the socket, all at once. Since flow and congestion control are implemented, the socket will buffer the data you write to the socket, and will send it appropriately according to its flow/congestion algorithms. See examples/tcp-large-transfer.cc It seems to me that that if you wanted to send data one chunk at a time with your approach, that it should be up to your code to check to see if you are out of data, and if so, don't call Send. Doesn't this break the loop of which you speak? Quote:I would expect TCP to eventually stop calling me to send more data, for some time... To clarify some about the implementation, the registered callback gets called whenever data is actually sent down the stack...the semantics are more like a notification of sent data, not necessarily a request for more data.
(In reply to comment #1) > If you'd like to send TCP data "as fast as possible" you just need write all of > the data into the socket, all at once. Since flow and congestion control are > implemented, the socket will buffer the data you write to the socket, and will > send it appropriately according to its flow/congestion algorithms. See > examples/tcp-large-transfer.cc > > It seems to me that that if you wanted to send data one chunk at a time with > your approach, that it should be up to your code to check to see if you are out > of data, and if so, don't call Send. Doesn't this break the loop of which you > speak? > > Quote:I would expect TCP to eventually stop calling me > to send more data, for some time... > > To clarify some about the implementation, the registered callback gets called > whenever data is actually sent down the stack...the semantics are more like a > notification of sent data, not necessarily a request for more data. *sigh* I _really_ think we are deviating too much already from the BSD sockets interface. Because we are "simplifying" we end up with completely different socket semantics, and the new semantics are not as well tested in real world as the old ones. To clarify, what I want to do is to send as much data as possible. There is no limit to the data to send, I just want to saturate the network with TCP traffic. In plain old C sockets I would do something like: int s = socket(...) # create and connec the TCP socket struct pollfd fds[1] = {{s, POLLOUT, 0}}; while (1) { poll(fds, 1, -1); if (fds.revents & POLLOUT) { send(s, buf, SIZE, 0); } } This code will send as much traffic as possible, with implicit flow control. The poll() is used just to illustrate that this could as easily become asynchronous, callback-based just like NS-3, with little additional complexity. From what you say, to do this in NS-3 I'd have to keep track of the number of bytes "in transit", and send more data only when the number of bytes in transit is below a certain threshold. Conclusion: 1. I have to keep track of in transit bytes; 2. I have to _know_ which threshold is adequate before deciding to send more data; Clearly the NS-3 solution is an order of magnitude more complicated than real life!
> *sigh* I _really_ think we are deviating too much already from the BSD sockets > interface. Because we are "simplifying" we end up with completely different > socket semantics, and the new semantics are not as well tested in real world as > the old ones. The semantics are different because our API is an _asynchronous_ API, that is all. All the complexity you refer to comes from this. What you seem to be confused by is that you want a _synchronous_ API and yes, this is desirable but it is not there yet. > > To clarify, what I want to do is to send as much data as possible. There is no > limit to the data to send, I just want to saturate the network with TCP > traffic. In plain old C sockets I would do something like: > > int s = socket(...) # create and connec the TCP socket > struct pollfd fds[1] = {{s, POLLOUT, 0}}; > while (1) > { > poll(fds, 1, -1); > if (fds.revents & POLLOUT) { > send(s, buf, SIZE, 0); > } > } > > This code will send as much traffic as possible, with implicit flow control. > The poll() is used just to illustrate that this could as easily become > asynchronous, callback-based just like NS-3, with little additional complexity. Sure, however, to implement this, you would need a thread library to implement the blocking poll call and, so far, this has not been done. > > From what you say, to do this in NS-3 I'd have to keep track of the number of > bytes "in transit", and send more data only when the number of bytes in transit > is below a certain threshold. Conclusion: > 1. I have to keep track of in transit bytes; > 2. I have to _know_ which threshold is adequate before deciding to send more > data; No, what is missing is a way for a user of a Socket object to be notified when there is room available in the tx buffer. More about this below. (In reply to comment #1) > It seems to me that that if you wanted to send data one chunk at a time with > your approach, that it should be up to your code to check to see if you are out > of data, and if so, don't call Send. Doesn't this break the loop of which you > speak? > > Quote:I would expect TCP to eventually stop calling me > to send more data, for some time... > > To clarify some about the implementation, the registered callback gets called > whenever data is actually sent down the stack...the semantics are more like a > notification of sent data, not necessarily a request for more data. The semantics _I_ expect are that it (the sent callback) should be invoked when the associated data has been _completely_ sent and acked for TCP. i.e., when it has been removed from the tx buffer. At this point, I believe that it is perfectly legitimate to attempt to send more data from this callback and to expect the system not to crash on you. i.e., what gustavo is trying to do seems perfectly legitimate.
(In reply to comment #3) > > *sigh* I _really_ think we are deviating too much already from the BSD sockets > > interface. Because we are "simplifying" we end up with completely different > > socket semantics, and the new semantics are not as well tested in real world as > > the old ones. > > The semantics are different because our API is an _asynchronous_ API, that is > all. All the complexity you refer to comes from this. What you seem to be > confused by is that you want a _synchronous_ API and yes, this is desirable but > it is not there yet. Not at all, the difference is not in asynchronous vs synchronous. The complexity I talk about refers to having to keep track of a counter "bytes asked to send but not yet fully sent", and knowing a threshold, in order to be able to send as much data as TCP can handle without collapsing NS-3. In asynchronous callback based libraries, like GLib, you are notified when the socket buffer accepts more data, and so it's dead easy to send as much data as TCP can handle. > > > > > To clarify, what I want to do is to send as much data as possible. There is no > > limit to the data to send, I just want to saturate the network with TCP > > traffic. In plain old C sockets I would do something like: > > > > int s = socket(...) # create and connec the TCP socket > > struct pollfd fds[1] = {{s, POLLOUT, 0}}; > > while (1) > > { > > poll(fds, 1, -1); > > if (fds.revents & POLLOUT) { > > send(s, buf, SIZE, 0); > > } > > } > > > > This code will send as much traffic as possible, with implicit flow control. > > The poll() is used just to illustrate that this could as easily become > > asynchronous, callback-based just like NS-3, with little additional complexity. > > Sure, however, to implement this, you would need a thread library to implement > the blocking poll call and, so far, this has not been done. > > > > > From what you say, to do this in NS-3 I'd have to keep track of the number of > > bytes "in transit", and send more data only when the number of bytes in transit > > is below a certain threshold. Conclusion: > > 1. I have to keep track of in transit bytes; > > 2. I have to _know_ which threshold is adequate before deciding to send more > > data; > > No, what is missing is a way for a user of a Socket object to be notified when > there is room available in the tx buffer. More about this below. > > > (In reply to comment #1) > > > It seems to me that that if you wanted to send data one chunk at a time with > > your approach, that it should be up to your code to check to see if you are out > > of data, and if so, don't call Send. Doesn't this break the loop of which you > > speak? > > > > Quote:I would expect TCP to eventually stop calling me > > to send more data, for some time... > > > > To clarify some about the implementation, the registered callback gets called > > whenever data is actually sent down the stack...the semantics are more like a > > notification of sent data, not necessarily a request for more data. > > The semantics _I_ expect are that it (the sent callback) should be invoked when > the associated data has been _completely_ sent and acked for TCP. i.e., when it > has been removed from the tx buffer. What makes you so _sure_ this is what is best. BSD sockets have been around for decades; they do not work as you describe, and they seem to have worked fine all this time. I honestly don't see any reason why it is simpler to notify that data has been sent than to notify that there is room in the send buffer for more data. In fact, what applications usually want to know is that there is room in the buffer, not that the last data was acknowledged. If you wait for ack's, especially with large delay networks, the application then has to take some time to produce more data, and meanwhile the TCP socket will have no data to send and will just sit there idle, and network will end up being underutilized. > At this point, I believe that it is > perfectly legitimate to attempt to send more data from this callback and to > expect the system not to crash on you. i.e., what gustavo is trying to do seems > perfectly legitimate. It actually works for _my_ case, but NS-3 sockets are not reallistic and one year from now a new issue will come up because we chose to deviate from the standard interface for no apparent benefit. I am fine with keeping two different callbacks, though.
(In reply to comment #4) > > The semantics are different because our API is an _asynchronous_ API, that is > > all. All the complexity you refer to comes from this. What you seem to be > > confused by is that you want a _synchronous_ API and yes, this is desirable but > > it is not there yet. > > Not at all, the difference is not in asynchronous vs synchronous. The > complexity I talk about refers to having to keep track of a counter "bytes > asked to send but not yet fully sent", and knowing a threshold, in order to be > able to send as much data as TCP can handle without collapsing NS-3. Keeping track of this counter should not be needed. That was my point. > > In asynchronous callback based libraries, like GLib, you are notified when the > socket buffer accepts more data, and so it's dead easy to send as much data as > TCP can handle. Yes. I agree that this is what you should be able to do with the ns-3 sockets. And I agree that it is a bug that you can't do this now. > > > > > > > > > To clarify, what I want to do is to send as much data as possible. There is no > > > limit to the data to send, I just want to saturate the network with TCP > > > traffic. In plain old C sockets I would do something like: > > > > > > int s = socket(...) # create and connec the TCP socket > > > struct pollfd fds[1] = {{s, POLLOUT, 0}}; > > > while (1) > > > { > > > poll(fds, 1, -1); > > > if (fds.revents & POLLOUT) { > > > send(s, buf, SIZE, 0); > > > } > > > } > > > > > > This code will send as much traffic as possible, with implicit flow control. > > > The poll() is used just to illustrate that this could as easily become > > > asynchronous, callback-based just like NS-3, with little additional complexity. > > > > Sure, however, to implement this, you would need a thread library to implement > > the blocking poll call and, so far, this has not been done. > > > > > > > > From what you say, to do this in NS-3 I'd have to keep track of the number of > > > bytes "in transit", and send more data only when the number of bytes in transit > > > is below a certain threshold. Conclusion: > > > 1. I have to keep track of in transit bytes; > > > 2. I have to _know_ which threshold is adequate before deciding to send more > > > data; > > > > No, what is missing is a way for a user of a Socket object to be notified when > > there is room available in the tx buffer. More about this below. > > > > > > (In reply to comment #1) > > > > > It seems to me that that if you wanted to send data one chunk at a time with > > > your approach, that it should be up to your code to check to see if you are out > > > of data, and if so, don't call Send. Doesn't this break the loop of which you > > > speak? > > > > > > Quote:I would expect TCP to eventually stop calling me > > > to send more data, for some time... > > > > > > To clarify some about the implementation, the registered callback gets called > > > whenever data is actually sent down the stack...the semantics are more like a > > > notification of sent data, not necessarily a request for more data. > > > > The semantics _I_ expect are that it (the sent callback) should be invoked when > > the associated data has been _completely_ sent and acked for TCP. i.e., when it > > has been removed from the tx buffer. > > What makes you so _sure_ this is what is best. BSD sockets have been around > for decades; they do not work as you describe, and they seem to have worked > fine all this time. I honestly don't see any reason why it is simpler to > notify that data has been sent than to notify that there is room in the send > buffer for more data. I actually proposed that you are notified when data is removed from the tx buffer after it was acked, that is, when there is room in the tx buffer. So, it seems that we agree. > > In fact, what applications usually want to know is that there is room in the > buffer, not that the last data was acknowledged. If you wait for ack's, > especially with large delay networks, the application then has to take some > time to produce more data, and meanwhile the TCP socket will have no data to > send and will just sit there idle, and network will end up being underutilized. yes, again, we agree. > > > > > At this point, I believe that it is > > perfectly legitimate to attempt to send more data from this callback and to > > expect the system not to crash on you. i.e., what gustavo is trying to do seems > > perfectly legitimate. > > It actually works for _my_ case, but NS-3 sockets are not reallistic and one > year from now a new issue will come up because we chose to deviate from the > standard interface for no apparent benefit. I do not think that this is true, that is, that we are deviating in the interface. It seems that there is a bug in the implementation though and yes, we should fix it. > I am fine with keeping two different callbacks, though.
(In reply to comment #5) [...] > I actually proposed that you are notified when data is removed from the tx > buffer after it was acked, that is, when there is room in the tx buffer. So, it > seems that we agree. We _almost_ agree. The events "data is removed from the tx buffer" and "there is room in the tx buffer" are not completely coincidental; there _is_ a difference. In real BSD sockets there is a buffer of a certain maximum size between application and socket. In BSD sockets, applications can send to the socket buffer a certain ammount of traffic even before any previous traffic is acknowledged and removed from the tx buffer. When a socket is connected, it enters the "there is room in the tx buffer" state immediately, even before any "data is removed from the tx buffer" event. It's a subtle difference, sure, but I can't be sure it is not significant, and it does not sound like confusing/merging the two states will simplify anything for the programmer (well, maybe it will not make life easier for Raj, but that's a different story :).
Created attachment 108 [details] Quick fix to break the recursive loop This doesn't address the deeper issues of exactly when to call the callbacks, but it does break the recursive loop (I think).
Created attachment 109 [details] patch that subsumes Raj's patch and reassigns SendCallback reassign previous sendCallback to new DataSent callback. Now, sendCallback is never invoked (because there is no finite socket buffer modeled). The DataSent callback is as it was before, except for raj's previous patch to break possible infinite recursion.
changeset 2340 fixed the basic recursion problem; there is still a need for implementing a finite send buffer (will rename the bug)
Created attachment 112 [details] revised patch based on discussion This patch attempts to capture the semantics discussed on the telecon; namely, that SendCallback is always called when new space is made available in the buffer, and that it reports the available buffer space (see the Doxygen). For now, SendCallback is only implemented for Tcp (see bug 141 for Udp issues)
(In reply to comment #10) >[...]it reports the available buffer space (see the Doxygen). I should point out that real sockets do not report this information. Not that it hurts to have extra info, but there was never any application written to take this value into consideration... I also think the documentation for the DataSend callback is too detailed; it should simply state something like "callback is called when there is room in the socket transmit buffer to accept more data for transmission". The reasons why the room is made available should not be documented, else they become part of the API and changing the buffer management algorithm will essencially break the API, which should be avoided where possible.
*** Bug 170 has been marked as a duplicate of this bug. ***
The new socket finite buffers are in: http://code.nsnam.org/raj/ns-3-dev-socket-helper/ Both send and receive buffers are finite, with callback semantics as described in the doxygen for ns3::Socket.
raj/ns-3-dev-socket-helper has been merged into ns-3-dev.