|
Bugzilla – Full Text Bug Listing |
| Summary: | Newer version of tcp-2way that exhibits an 'excess bytes' problem | ||
|---|---|---|---|
| Product: | ns-3 | Reporter: | evensky |
| Component: | internet | Assignee: | ns-bugs <ns-bugs> |
| Status: | RESOLVED FIXED | ||
| Severity: | normal | CC: | raj.b |
| Priority: | P1 | ||
| Version: | pre-release | ||
| Hardware: | PC | ||
| OS: | Linux | ||
| Attachments: |
test case that shows the bug.
output with the 'bad timings' showing weird number of bytes. output with the 'good/regular timings' that runs correctly. corrected tcp-2way-2.cc even more correct version of tcp-2way-2.cc Add some asserts that show the logical inconsitency fixes the issue |
||
|
Description
evensky
2008-07-07 20:09:50 UTC
Created attachment 194 [details]
test case that shows the bug.
Run as:
./tcp-2way-2 --useDummyBytes=0 --recvSize=600 --preSchedule=1 --useBadTimings=0
to have correct results. It has output like:
...
536:I{536} *bytes from 10.1.0.1 [02-06-0a:01:00:01:01:c0] <0|536|381>
381:I{381} *bytes from 10.1.0.1 [02-06-0a:01:00:01:01:c0] <0|381|0>
Finished Packet. from handleread: cmpl_recv(0.328662,0,0,10000,1) Read 10000 of <I>
simulator::cmpl_recv(0.328662,0,0,10000,1)
....
where all 10000 bytes were read, and there is no left over bytes (<0|) in the packet, so processing on the packet ends.
In the bad case:
./tcp-2way-2 --useDummyBytes=0 --preSchedule=1 --recvSize=600 --useBadTimings=1
...
536:I{536} *bytes from 10.1.0.1 [02-06-0a:01:00:01:01:c0] <0|536|338>
381:I{381} *bytes from 10.1.0.1 [02-06-0a:01:00:01:01:c0] <43|338|0>
Finished Packet. from handleread: cmpl_recv(0.0221596,0,0,10000,1) Read 10000 of <I>
simulator::cmpl_recv(0.0221596,0,0,10000,1)
43:I{43} bytes from 10.1.0.1 [02-06-0a:01:00:01:01:c0] Removing unexpected header.
....
Notice that there appears to be 43bytes left over, and in the rle of the PeekData in the packet there are 43 bytes of payload 'I'. What is confusing is that the program send 10000 bytes of 'I' as payload but there appears to be 10043 bytes of 'I' as payload.
Also note with the 'bad' timing we are seeing both directions being interleaved here.
Thanks,
\dae
A few more comments: It looks like differences arise on the unfortunate payload ('I' filled) at the start. For some reason the initial packet is larger than the others. There is nothing wrong with that, the recvSize is 600, but it is odd:
The without the useBadTimings we have at the start of the payload:
Before Marking|Payload (size=10000)
After Header Marking|ns3::ArchHeader ((ArchHeader=I:10000:0:0:0)) Payload (size=10000)
my_sim->cmpl_send(0.32,0,0,10000,0)
536:.'.{26}I{508} *bytes from 10.1.0.1 [02-06-0a:01:00:01:01:c0] <0|507(29)|9493>
536:I{536} *bytes from 10.1.0.1 [02-06-0a:01:00:01:01:c0] <0|536|8957>
536:I{536} *bytes from 10.1.0.1 [02-06-0a:01:00:01:01:c0] <0|536|8421>
536:I{536} *bytes from 10.1.0.1 [02-06-0a:01:00:01:01:c0] <0|536|7885>
....
showing the send of 10000 bytes and the first few chunks to be read in. Note that in this case we get 536 bytes of which 507 are payload. (the rle has an 'I' as part of the header which is why there are 508 shown in the rle dump).
If we turn on the useBadTimings, we get:
Before Marking|Payload (size=10000)
After Header Marking|ns3::ArchHeader ((ArchHeader=I:10000:0:0:0)) Payload (size=10000)
my_sim->cmpl_send(0.00296172,0,0,10000,0)
493:H{493} *bytes from 10.1.0.2 [02-06-0a:01:00:02:20:00] <0|493|0>
Finished Packet. from handleread: cmpl_recv(0.00296172,0,0,1000,0) Read 1000 of <H>
simulator::cmpl_recv(0.00296172,0,0,1000,0)
579:.'.{26}I{551} *bytes from 10.1.0.1 [02-06-0a:01:00:01:01:c0] <0|550(29)|9450>
536:I{536} *bytes from 10.1.0.1 [02-06-0a:01:00:01:01:c0] <0|536|8914>
536:I{536} *bytes from 10.1.0.1 [02-06-0a:01:00:01:01:c0] <0|536|8378>
note here we have the last bit of the 'H' filled payload interleaved between the send of the 1000 'I' filled bytes, and the recv of those bytes. Also, we have 43 more bytes in that first chunk, again still less that the 600 we allowed on the command line, but odd that it is different. Also odd that 43 is the number of bytes we seem to have in excess.
one last note tonight. I have if def'ed out the use of the SendCallback during my attempts to debug. However, the results don't seem to have when I define USE_HANDLESEND to put that code back in. ccing raj I believe I'm not even getting to the erroneous point, but getting crasher
before even getting to this point (using today's ns-3-dev):
./tcp-2way-2 --useDummyBytes=0 --recvSize=600 --preSchedule=1 --useBadTimings=0
tcpip_configure_pt2pt_channel(0,600,)
Prescheduling Packets.
ScheduleNexttime=0,tag=A,size=0
ScheduleNexttime=0.04,tag=B,size=0
ScheduleNexttime=0.08,tag=C,size=10
ScheduleNexttime=0.12,tag=D,size=10
ScheduleNexttime=0.16,tag=E,size=100
ScheduleNexttime=0.2,tag=F,size=100
ScheduleNexttime=0.24,tag=G,size=1000
ScheduleNexttime=0.28,tag=H,size=1000
ScheduleNexttime=0.32,tag=I,size=10000
ScheduleNexttime=0.36,tag=J,size=10000
subsequent ScheduleNext() REUSING time 0.36
ScheduleNext() returns false
Before Marking|
assert failed. file=/home/raj/code.nsnam.org/ns-3-dev/build/debug/ns3/buffer.h,
line=546, cond="Check (m_current)"
Trace/breakpoint trap
In gdb:
Program received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 0xb6cc4720 (LWP 23707)]
ns3::Buffer::Iterator::WriteU8 (this=0xbf891e54, data=0 '\0') at
/home/raj/code.nsnam.org/ns-3-dev/build/debug/ns3/buffer.h:548
548 if (m_current < m_zeroStart)
(gdb) bt
#0 ns3::Buffer::Iterator::WriteU8 (this=0xbf891e54, data=0 '\0') at
/home/raj/code.nsnam.org/ns-3-dev/build/debug/ns3/buffer.h:548
#1 0xb7b93ed7 in ns3::Buffer::Iterator::WriteU64 (this=0xbf891e54, data=0) at
../src/common/buffer.cc:802
#2 0x0805e7bb in ns3::ArchHeader::Serialize (this=0xbf892068, i=
{m_zeroStart = 17, m_zeroEnd = 17, m_dataStart = 0, m_dataEnd = 17,
m_current = 18, m_data = 0x808a9b0 ""}) at tcp-2way-2.cc:330
#3 0xb7bc6348 in ns3::Packet::AddHeader (this=0x808a968, header=@0xbf892068)
at ../src/common/packet.cc:190
#4 0x0805bc88 in ns3::TalkerApp::DoSendPacket (this=0x8089240,
sock=@0xbf892178, fill=65 'A', payloadsize=0, msgid=0, src=0, dest=0)
at tcp-2way-2.cc:509
#5 0x0805c1e5 in ns3::TxTalkerApp::SendPacket (this=0x8089240, fill=65 'A',
payloadsize=0, msgid=0, src=0, dest=0)
at tcp-2way-2.cc:964
#6 0x08056d0b in Notify (this=0x80894c0) at
/home/raj/code.nsnam.org/ns-3-dev/build/debug/ns3/simulator.h:813
#7 0xb7b67186 in ns3::EventImpl::Invoke (this=0x80894c0) at
../src/simulator/event-impl.cc:39
#8 0xb7b6e909 in ns3::SimulatorPrivate::ProcessOneEvent (this=0x8087098) at
../src/simulator/simulator.cc:207
#9 0xb7b6e95b in ns3::SimulatorPrivate::Run (this=0x8087098) at
../src/simulator/simulator.cc:235
#10 0xb7b71286 in ns3::Simulator::Run () at ../src/simulator/simulator.cc:480
#11 0x0805f89c in main (argc=5, argv=0xbf892484) at tcp-2way-2.cc:1305
This looks to be a bug in the ArchHeader code that's blocking investigation of
the actual issue. Can you specify a revision number of ns-3-dev which exhibits
the issue?
raj, how do I find out what version/rev of ns-3-dev I'm using? raj, is this what you need: evensky:~/ns-3-dev$ hg head changeset: 3383:f21d3d5926b9 tag: tip user: Craig Dowell <craigdo@ee.washington.edu> date: Thu Jul 03 20:03:03 2008 -0700 summary: some slips of the keyboard in tutorial okay, i still get the assertion failure on x86 ubuntu, g++ (GCC) 4.2.3 (Ubuntu 4.2.3-2ubuntu7). looks like i'll have to debug this issue before we get to the erroneous amount of data issue. Raj, I did a hg pull/update to get the current version (can't sit still too long with ns-3-dev!) and I still see the same behavior and no crash re ArchHeader that you are seeing. I'm using a current version of the day Debian Sid but with a hand rolled kernel 2.6.24.7 and g++: g++ (GCC) 4.2.4 (Debian 4.2.4-3) on a Lenovo X61s (core2 duo). FYI, for kicks I recompiled current ns-3-dev (3397:cad55f67f3fc) and my test code with g++ (Debian 4.3.1-5) 4.3.1, but it had no effect. I was seeing maybe I could duplicate the bug that Raj is seeing on his setup. Okay, so I guess we need more context, because I'm on x86 OS X 10.4, g++ 4.0.1 (Apple Computer, Inc. build 5363), and I still get the same crasher. I dropped the script into the scratch directory, and had to remove an unused local variable. I also tried building with: g++ -DNS3_ASSERT_ENABLE -DNS3_LOG_ENABLE -DDEBUG -g -O0 -I/Users/Raj/code.nsnam.org/ns-3-dev/build/debug -I. -L/Users/Raj/code.nsnam.org/ns-3-dev/build/debug -lns3 -o tcp-2way bug248tcp-2way.cc Both give me the same crasher. Can someone download this attachment fresh and compile to get the actual error we are after? I may not be the person you are looking at to reproduce, but: evensky@waltz$ md5sum tcp-2way-2.cc 2ab7aa373022bc4d232d027fef23df11 tcp-2way-2.cc evensky@waltz$ g++ -o tcp-2way-2 tcp-2way-2.cc -g -O0 -Wall -DNS3_ASSERT_ENABLE -DNS3_LOG_ENABLE -DDEBUG -I/home/evensky/ns-3-dev/build/debug -L/home/evensky/ns-3-dev/build/debug -Wl,-rpath=/home/evensky/ns-3-dev/build/debug -lns3 tcp-2way-2.cc: In function 'void setup_pt2pt(bool, bool, bool, uint32_t, uint32_t, ns3::ApplicationContainer*)': tcp-2way-2.cc:1151: warning: unused variable 'count' evensky@waltz$ ./tcp-2way-2 --useDummyBytes=0 --preSchedule=1 --useBadTimings=1 --doLogging=1 > bad.out 2>&1 Trace/breakpoint trap evensky@waltz$ ./tcp-2way-2 --useDummyBytes=0 --preSchedule=1 --useBadTimings=0 --doLogging=1 > good.out 2>&1 Created attachment 200 [details]
output with the 'bad timings' showing weird number of bytes.
Created attachment 201 [details]
output with the 'good/regular timings' that runs correctly.
Raj, I found a live-cd with g++ (KNOPPIX_V5.3.1DVD-2008-03-26-EN.iso) and have confirmed the bug you are seeing in in ArchHeader. I don't know why my Debian sid environment doesn't show this bug. I'll rework tcp-2way-c.cc. Thank goodness! I really want to get to the bottom of your issue, BUT it is also very troublesome that we are getting this platform dependence issue. I'm wondering what is going on here... I just tried: Red Hat Enterprise Linux Server release 5.1 (Tikanga) system using g++ (GCC) 4.1.2 20071124 (Red Hat 4.1.2-38), and it works like my Debian sid system. I've also tried a 32bit Debian sid system, and that does show the behavior you've seen (ArchHeader bug). I'll do my development on that, but I wonder if the platform dependency is also a bug.... (In reply to comment #17) > I'm wondering what is going on here... > > I just tried: Red Hat Enterprise Linux Server release 5.1 (Tikanga) system > using > g++ (GCC) 4.1.2 20071124 (Red Hat 4.1.2-38), and it works like my Debian sid > system. I've also tried a 32bit Debian sid system, and that does show the > behavior > you've seen (ArchHeader bug). I'll do my development on that, but I wonder if > the > platform dependency is also a bug.... > Yes, that is precisely what I am thinking, and its really worrisome to have this kind of platform dependent behavior. Does this seem like a 32/64 bit issue? Were the machines it works on all using e.g. x86-64? Created attachment 203 [details]
corrected tcp-2way-2.cc
I had some sizeof(long) rather than sizeof(uint64_t) corresponding to WriteU64. It unluckily worked on 64 versions of linux, but failed on the Mac's gcc, and 32 bit versions of Knoppix and Debian sid.
(In reply to comment #19) > Created an attachment (id=203) [details] > corrected tcp-2way-2.cc > > I had some sizeof(long) rather than sizeof(uint64_t) corresponding to WriteU64. > It unluckily worked on 64 versions of linux, but failed on the Mac's gcc, and > 32 bit versions of Knoppix and Debian sid. I should point out that you could also write this: uint32_t ArchHeader::Deserialize (Buffer::Iterator i) { Buffer::Iterator start = i; m_size = i.ReadU32 (); m_msgid = i.ReadU64 (); m_src = i.ReadU64 (); m_dest = i.ReadU64 (); m_tagchar = i.ReadU8 (); return i.GetDistanceFrom (start); } that's much better for the # bytes! I'll roll that in.... Thx. Can you generate traces for this example? I.e. does the following work? ./tcp-2way-2 --useDummyBytes=0 --recvSize=600 --preSchedule=1 --useBadTimings=1 --createTraces=1 to document here what was reported on irc: I get a segv if createTraces is turned on. Created attachment 205 [details]
even more correct version of tcp-2way-2.cc
I fixed a few bugs:
1. made the ofstream ascii static so that it lives long enough for the run.
2. fixed bad usage of int/uint32_t that needed to do the right thing if negative
3. following lacage, made ArchHeader::Deserialize less fragile, but ArchHeader::GetSerializedSize still fragile...
Created attachment 214 [details]
Add some asserts that show the logical inconsitency
These asserts fail when useBadTimings is true, and not otherwise. So the issue is the management of the tcp rx buffer, namely, the code doesn't handle some cases of overlapping received sequences, it doesn't fragment correctly to assure reliable delivery to the application, it can incorrectly put things into the buffer without enforcing the logic of the sequence numbers of the buffer.
Fix is on the way.
Created attachment 215 [details]
fixes the issue
Added the additional logic to make this particular case work. I'm not going to push yet because the fix needs to be implemented elsewhere in the tcp code to make sure this doesn't happen again, namely these checks need to be done whenever we insert into the rx buffer. The patch is for David's immediate use though, for his patience ;-)
I've applied and tested Raj's patch and it works for me. Thx. Fixed in changeset 3493:53ec8893863d |