Bug 1336

Summary: RocketFuelTopologyReader valgrind errors
Product: ns-3 Reporter: John Abraham <john.abraham.in>
Component: topology-readAssignee: Tommaso Pecorella <tommaso.pecorella>
Status: RESOLVED INVALID    
Severity: minor CC: ns-bugs, tazaki
Priority: P5    
Version: pre-release   
Hardware: All   
OS: All   
Attachments: Bugfix ?
Possible fix
extend valgrind suppressions for this

Description John Abraham 2012-01-05 17:34:33 UTC
See full log at https://ns-buildmaster.ee.washington.edu:8010/job/Ubuntu-32-10.04/label=Ubuntu-32-10.04/3/consoleFull

brief log:
==18461== Invalid read of size 4
==18461==    at 0x616383B: ??? (in /lib/tls/i686/cmov/libc-2.11.1.so)
==18461==    by 0x58DCE6C: ns3::RocketfuelTopologyReader::Read() (rocketfuel-topology-reader.cc:377)
==18461==    by 0x58BB0B3: ns3::RocketfuelTopologyReaderTest::DoRun() (rocketfuel-topology-reader-test-suite.cc:63)
==18461==    by 0x5DD1846: ns3::TestCase::Run(ns3::TestRunnerImpl*) (test.cc:207)
==18461==    by 0x5DD17EC: ns3::TestCase::Run(ns3::TestRunnerImpl*) (test.cc:201)
==18461==    by 0x5DD4944: ns3::TestRunnerImpl::Run(int, char**) (test.cc:785)
==18461==    by 0x5DD4AF0: ns3::TestRunner::Run(int, char**) (test.cc:808)
==18461==    by 0x8048B4E: main (test-runner.cc:23)
==18461==  Address 0x635cee8 is 48 bytes inside a block of size 49 alloc'd
==18461==    at 0x4027C13: operator new(unsigned int) (vg_replace_malloc.c:282)
==18461==    by 0x6032D05: std::string::_Rep::_S_create(unsigned int, unsigned int, std::allocator<char> const&) (in /usr/lib/libstdc++.so.6.0.13)
==18461==    by 0x6033977: std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned int) (in /usr/lib/libstdc++.so.6.0.13)
==18461==    by 0x60347AC: std::string::reserve(unsigned int) (in /usr/lib/libstdc++.so.6.0.13)
==18461==    by 0x6034A1A: std::string::append(char const*, unsigned int) (in /usr/lib/libstdc++.so.6.0.13)
==18461==    by 0x600F33E: std::basic_istream<char, std::char_traits<char> >& std::getline<char, std::char_traits<char>, std::allocator<char> >(std::basic_istream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, char) (in /usr/lib/libstdc++.so.6.0.13)
==18461==    by 0x6025B89: std::basic_istream<char, std::char_traits<char> >& std::getline<char, std::char_traits<char>, std::allocator<char> >(std::basic_istream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (in /usr/lib/libstdc++.so.6.0.13)
==18461==    by 0x58DC951: ns3::RocketfuelTopologyReader::Read() (rocketfuel-topology-reader.cc:333)
==18461==    by 0x58BB0B3: ns3::RocketfuelTopologyReaderTest::DoRun() (rocketfuel-topology-reader-test-suite.cc:63)
==18461==    by 0x5DD1846: ns3::TestCase::Run(ns3::TestRunnerImpl*) (test.cc:207)
==18461==    by 0x5DD17EC: ns3::TestCase::Run(ns3::TestRunnerImpl*) (test.cc:201)
==18461==    by 0x5DD4944: ns3::TestRunnerImpl::Run(int, char**) (test.cc:785)
==18461== 
==18461== Conditional jump or move depends on uninitialised value(s)
==18461==    at 0x6163849: ??? (in /lib/tls/i686/cmov/libc-2.11.1.so)
==18461==    by 0x58DCE6C: ns3::RocketfuelTopologyReader::Read() (rocketfuel-topology-reader.cc:377)
==18461==    by 0x58BB0B3: ns3::RocketfuelTopologyReaderTest::DoRun() (rocketfuel-topology-reader-test-suite.cc:63)
==18461==    by 0x5DD1846: ns3::TestCase::Run(ns3::TestRunnerImpl*) (test.cc:207)
==18461==    by 0x5DD17EC: ns3::TestCase::Run(ns3::TestRunnerImpl*) (test.cc:201)
==18461==    by 0x5DD4944: ns3::TestRunnerImpl::Run(int, char**) (test.cc:785)
==18461==    by 0x5DD4AF0: ns3::TestRunner::Run(int, char**) (test.cc:808)
==18461==    by 0x8048B4E: main (test-runner.cc:23)
Comment 1 Tommaso Pecorella 2012-01-08 10:07:31 UTC
Created attachment 1298 [details]
Bugfix ?

I can't check if this will fix the "bug", as my valgrind doesn't seems to be unhappy with the actual code.
Hence, this is a best guess, but I could be totally wrong.

Anyway, the only "plausible" bug is related to the EOF reached while reading a new line and the line is still trying to be processed. The patch doesn't change anything for real in the module logic, it "just" adds a check to see if the string we're operating onto is not empty.

I don't know if it's possible to access the buildbot to check if this fixes the bug before merging it, but I'd be happier to have a confirm from someone that can actually see this bug.

Cheers,

T.
Comment 2 John Abraham 2012-01-08 10:22:25 UTC
I think we should give this bug a lower priority. As you observed different versions of gcc and valgrind give different results. So it may not be worth fixing at all if the latest versions are ok with it (Fedora is ok with it, but not Ubuntu). I think we can automate the application of a patch and running a test on a particular OS (buildslave), but we haven't tested it out yet. Until that infrastructure is ready, I would recommend deferring this bug.


(In reply to comment #1)
> Created attachment 1298 [details]
> Bugfix ?
> 
> I can't check if this will fix the "bug", as my valgrind doesn't seems to be
> unhappy with the actual code.
> Hence, this is a best guess, but I could be totally wrong.
> 
> Anyway, the only "plausible" bug is related to the EOF reached while reading a
> new line and the line is still trying to be processed. The patch doesn't change
> anything for real in the module logic, it "just" adds a check to see if the
> string we're operating onto is not empty.
> 
> I don't know if it's possible to access the buildbot to check if this fixes the
> bug before merging it, but I'd be happier to have a confirm from someone that
> can actually see this bug.
> 
> Cheers,
> 
> T.
Comment 3 Tommaso Pecorella 2012-01-08 11:14:40 UTC
Indeed.

I'm adding Hajime Tazaki to the bug since this particular code was by him.

I noticed also a couple of missing (?) regfree() and some unused regerror() - meaning that the returned error string is not used so they're useless.

Moreover there's a "strange" self-copy of a buffer:
buf = line.c_str();
[...]
line = buf;

Honestly I never used that kind of regexp functions in my c++ code so I'm a bit lost.

If Hajime can fix it it would be great.

In the meantime, I'm lowering this bug priority. It seems almost harmless anyway since the topology is read once and then "forgotten".

Cheers,

T.
Comment 4 John Abraham 2012-01-08 14:35:17 UTC
BTW , I tried the patch
https://ns-buildmaster.ee.washington.edu:8010/job/ns-3-dev-With-Valgrind-template/11/label=Ubuntu-32-10.04/consoleText

But the error persists.

I feel one set of these errors might by replacing
      int ret;
      int argc;

with
      int ret=0;
      int argc=0;


(In reply to comment #2)
> I think we should give this bug a lower priority. As you observed different
> versions of gcc and valgrind give different results. So it may not be worth
> fixing at all if the latest versions are ok with it (Fedora is ok with it, but
> not Ubuntu). I think we can automate the application of a patch and running a
> test on a particular OS (buildslave), but we haven't tested it out yet. Until
> that infrastructure is ready, I would recommend deferring this bug.
> 
> 
> (In reply to comment #1)
> > Created attachment 1298 [details]
> > Bugfix ?
> > 
> > I can't check if this will fix the "bug", as my valgrind doesn't seems to be
> > unhappy with the actual code.
> > Hence, this is a best guess, but I could be totally wrong.
> > 
> > Anyway, the only "plausible" bug is related to the EOF reached while reading a
> > new line and the line is still trying to be processed. The patch doesn't change
> > anything for real in the module logic, it "just" adds a check to see if the
> > string we're operating onto is not empty.
> > 
> > I don't know if it's possible to access the buildbot to check if this fixes the
> > bug before merging it, but I'd be happier to have a confirm from someone that
> > can actually see this bug.
> > 
> > Cheers,
> > 
> > T.
Comment 5 Hajime Tazaki 2012-01-09 23:14:13 UTC
Thanks for notifying this bug. I'll check it and fix as soon as possible.
Comment 6 Tommaso Pecorella 2012-02-23 19:20:27 UTC
Created attachment 1338 [details]
Possible fix

I still get valgrind "errors", but they seems the usual ones from STL late memory free.

I can't log into the buildmaster right now. Can you check them so I can close this ?

Cheers,

T.
Comment 7 Tommaso Pecorella 2012-02-27 14:51:06 UTC
I just ran rocketfuel-topology-reader valgrind test on Fedora (the system I'm using to test for real Vs bogus valgrind errors, as John's suggested).

It does pass all the tests.

I'd say to re-enable it and close the bug. The problem might have been somewhere else and it was fixed "by mistake" (or by luck).

T.



[pecos@ns-3-leaks ns-3-dev]$ ./test.py -v -g -s rocketfuel-topology-reader
NS3_ENABLED_MODULES == ['ns3-aodv', 'ns3-applications', 'ns3-bridge', 'ns3-config-store', 'ns3-core', 'ns3-csma', 'ns3-csma-layout', 'ns3-dsdv', 'ns3-emu', 'ns3-energy', 'ns3-flow-monitor', 'ns3-internet', 'ns3-lte', 'ns3-mesh', 'ns3-mobility', 'ns3-mpi', 'ns3-netanim', 'ns3-network', 'ns3-nix-vector-routing', 'ns3-olsr', 'ns3-point-to-point', 'ns3-point-to-point-layout', 'ns3-propagation', 'ns3-spectrum', 'ns3-stats', 'ns3-tap-bridge', 'ns3-test', 'ns3-tools', 'ns3-topology-read', 'ns3-uan', 'ns3-virtual-net-device', 'ns3-wifi', 'ns3-wimax']
NS3_MODULE_PATH == ['/usr/lib/gcc/x86_64-redhat-linux/4.6.2', '/home/pecos/workspace/ns-3-dev/build']
NSC_ENABLED == False
ENABLE_REAL_TIME == True
ENABLE_THREADING == True
ENABLE_EXAMPLES == True
ENABLE_TESTS == True
EXAMPLE_DIRECTORIES == ['matrix-topology', 'socket', 'ipv6', 'udp', 'error-model', 'tutorial', 'stats', 'naming', 'wireless', 'tcp', 'mobility', 'energy', 'udp-client-server', 'routing', 'realtime']
ENABLE_PYTHON_BINDINGS == False
ENABLE_CLICK == False
ENABLE_OPENFLOW == False
APPNAME == ns
BUILD_PROFILE == debug
VERSION == 3-dev
PYTHON == ['/usr/bin/python']
Building: ./waf
Waf: Entering directory `/home/pecos/workspace/ns-3-dev/build'
Waf: Leaving directory `/home/pecos/workspace/ns-3-dev/build'
'build' finished successfully (2.243s)

Modules built:
aodv                      applications              bridge                   
config-store              core                      csma                     
csma-layout               dsdv                      emu                      
energy                    flow-monitor              internet                 
lte                       mesh                      mobility                 
mpi                       netanim                   network                  
nix-vector-routing        olsr                      point-to-point           
point-to-point-layout     propagation               spectrum                 
stats                     tap-bridge                test                     
tools                     topology-read             uan                      
virtual-net-device        wifi                      wimax                    

Modules not built:
click                     openflow                  visualizer               

os.environ["PYTHONPATH"] == /home/pecos/workspace/ns-3-dev/build/bindings/python
os.environ["LD_LIBRARY_PATH"] == :/usr/lib/gcc/x86_64-redhat-linux/4.6.2:/home/pecos/workspace/ns-3-dev/build
Synchronously execute /home/pecos/workspace/ns-3-dev/build/utils/ns3-dev-test-runner-debug --print-test-name-list
Return code =  0
stderr =  
Queue rocketfuel-topology-reader
Launch utils/ns3-dev-test-runner-debug --test-name=rocketfuel-topology-reader --stop-on-failure
Synchronously execute valgrind --suppressions=/home/pecos/workspace/ns-3-dev/testpy.supp --leak-check=full --show-reachable=yes --error-exitcode=2 /home/pecos/workspace/ns-3-dev/build/utils/ns3-dev-test-runner-debug --test-name=rocketfuel-topology-reader --stop-on-failure --xml --tempdir=testpy-output/2012-02-27-19-48-11-CUT --out=testpy-output/2012-02-27-19-48-11-CUT/rocketfuel-topology-reader.xml 
Return code =  0
stderr =  ==12856== Memcheck, a memory error detector
==12856== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==12856== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==12856== Command: /home/pecos/workspace/ns-3-dev/build/utils/ns3-dev-test-runner-debug --test-name=rocketfuel-topology-reader --stop-on-failure --xml --tempdir=testpy-output/2012-02-27-19-48-11-CUT --out=testpy-output/2012-02-27-19-48-11-CUT/rocketfuel-topology-reader.xml
==12856== 
==12856== 
==12856== HEAP SUMMARY:
==12856==     in use at exit: 0 bytes in 0 blocks
==12856==   total heap usage: 585,177 allocs, 585,177 frees, 52,126,057 bytes allocated
==12856== 
==12856== All heap blocks were freed -- no leaks are possible
==12856== 
==12856== For counts of detected and suppressed errors, rerun with: -v
==12856== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)

returncode = 0
---------- begin standard out ----------

---------- begin standard err ----------
==12856== Memcheck, a memory error detector
==12856== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==12856== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==12856== Command: /home/pecos/workspace/ns-3-dev/build/utils/ns3-dev-test-runner-debug --test-name=rocketfuel-topology-reader --stop-on-failure --xml --tempdir=testpy-output/2012-02-27-19-48-11-CUT --out=testpy-output/2012-02-27-19-48-11-CUT/rocketfuel-topology-reader.xml
==12856== 
==12856== 
==12856== HEAP SUMMARY:
==12856==     in use at exit: 0 bytes in 0 blocks
==12856==   total heap usage: 585,177 allocs, 585,177 frees, 52,126,057 bytes allocated
==12856== 
==12856== All heap blocks were freed -- no leaks are possible
==12856== 
==12856== For counts of detected and suppressed errors, rerun with: -v
==12856== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)

---------- end standard err ----------
PASS: TestSuite rocketfuel-topology-reader
1 of 1 tests passed (1 passed, 0 skipped, 0 failed, 0 crashed, 0 valgrind errors)
[pecos@ns-3-leaks ns-3-dev]$
Comment 8 Tommaso Pecorella 2012-03-09 14:39:10 UTC
Can someone confirm that this bug is gone on the test machines ?

Ty.
Comment 9 John Abraham 2012-03-14 13:48:59 UTC
Will confirm today
(In reply to comment #8)
> Can someone confirm that this bug is gone on the test machines ?
> 
> Ty.
Comment 10 John Abraham 2012-03-14 15:29:52 UTC
Unfortunately the problem persists.
https://ns-buildmaster.ee.washington.edu:8010/view/Daily/job/DailyWithValgrind/label=Ubuntu-32-10.04/61/console

 at least the fedora systems are ok with it.

If you want to change it WONTFIX thats ok with me. I don't know if Tom still wants to keep track of it.

Because we have to address the broader issue of valgrind for ubuntu in a separate forum. There are lot more valgrind errors on ubuntu anyway.



(In reply to comment #9)
> Will confirm today
> (In reply to comment #8)
> > Can someone confirm that this bug is gone on the test machines ?
> > 
> > Ty.
Comment 11 Tommaso Pecorella 2012-03-14 20:06:59 UTC
I think the problem is in the grep libraries used by the module, and not in ns-3.

I have very similar issues in other modules due to late-release memory from stl and such. Until we find a reliable way to disable the late-memory release valgrind will keep nagging. Unfortunately I am unable to apply the "fix" suggested in the valgrind forums. So I'm kinda stuck with it.

I'll leave the bug open 'til we agree on a general consensus about how to discriminate ns-3 memory leaks and those coming from external libraries (and how to deal with them).

Cheers,

T.
Comment 12 Tom Henderson 2012-03-15 02:06:00 UTC
Created attachment 1357 [details]
extend valgrind suppressions for this

This patch to testpy.supp suppresses the warnings for valgrind-3.7.0 on ubuntu-10.04-lts 32-bit.
Comment 13 Tommaso Pecorella 2012-03-25 12:40:49 UTC
Whitout Tom's patch, but with gtk disabled:

Linux VM (Ubuntu 11.10 - 64 bit):

==16955== Memcheck, a memory error detector
==16955== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==16955== Using Valgrind-3.6.1-Debian and LibVEX; rerun with -h for copyright info
==16955== Command: /home/pecos/workspace/ns-3-dev/build/utils/ns3-dev-test-runner-debug --test-name=rocketfuel-topology-reader --stop-on-failure --xml --tempdir=testpy-output/2012-03-25-16-32-09-CUT --out=testpy-output/2012-03-25-16-32-09-CUT/rocketfuel-topology-reader.xml
==16955== 
==16955== 
==16955== HEAP SUMMARY:
==16955==     in use at exit: 0 bytes in 0 blocks
==16955==   total heap usage: 581,407 allocs, 581,407 frees, 32,044,721 bytes allocated
==16955== 
==16955== All heap blocks were freed -- no leaks are possible
==16955== 
==16955== For counts of detected and suppressed errors, rerun with: -v
==16955== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 141 from 8)

Thanks Gustavo for the waf option.

I'll check it later on other platforms using Jenkins, just for fun.
Comment 14 Tommaso Pecorella 2013-07-09 17:15:07 UTC
Marked as invalid as the latest valgrind tests doesn't show this behaviour anymore.