Bug 790 - Memory leak in TestSuite routing-aodv-regression
Memory leak in TestSuite routing-aodv-regression
Status: RESOLVED FIXED
Product: ns-3
Classification: Unclassified
Component: routing
ns-3-dev
All All
: P1 blocker
Assigned To: ns-bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-01-12 18:38 UTC by Craig Dowell
Modified: 2010-01-18 04:03 UTC (History)
3 users (show)

See Also:


Attachments
commented ping (751 bytes, patch)
2010-01-15 03:13 UTC, Kirill Andreev
Details | Diff
Proposed fix (597 bytes, patch)
2010-01-15 03:57 UTC, Kirill Andreev
Details | Diff
Clear the list of sockets in ipv4-l3-protocol.cc (352 bytes, patch)
2010-01-15 06:49 UTC, Faker Moatamri
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Craig Dowell 2010-01-12 18:38:27 UTC
Fails valgrind on ns-regression

VALGR: TestSuite routing-aodv-regression

> lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 8.04.3 LTS
Release:        8.04
Codename:       hardy

> gcc -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with
-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/in
clude/c++/4.2 --program-suffix=-4.2 --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-mpfr --enable-checki
ng=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.2.4 (Ubuntu 4.2.4-1ubuntu4)

[ns-regression] ~/repos/ns-3-allinone-dev/ns-3-dev > ./test.py -g -v -s routing-aodv-regression
Building: ./waf
Waf: Entering directory `/home/craigdo/repos/ns-3-allinone-dev/ns-3-dev/build'
Waf: Leaving directory `/home/craigdo/repos/ns-3-allinone-dev/ns-3-dev/build'
'build' finished successfully (1.661s)
NS3_ACTIVE_VARIANT == debug
NS3_BUILDDIR == /home/craigdo/repos/ns-3-allinone-dev/ns-3-dev/build
NS3_MODULE_PATH == ['/usr/lib/gcc/x86_64-linux-gnu/4.2.4', '/home/craigdo/repos/ns-3-allinone-dev/nsc/linux-2.6.18', '/home/crai
gdo/repos/ns-3-allinone-dev/nsc/linux-2.6.26', '/home/craigdo/repos/ns-3-allinone-dev/ns-3-dev/build/debug']
ENABLE_NSC == False
ENABLE_REAL_TIME == True
ENABLE_EXAMPLES == True
os.environ["LD_LIBRARY_PATH"] == /usr/lib/gcc/x86_64-linux-gnu/4.2.4:/home/craigdo/repos/ns-3-allinone-dev/nsc/linux-2.6.18:/hom
e/craigdo/repos/ns-3-allinone-dev/nsc/linux-2.6.26:/home/craigdo/repos/ns-3-allinone-dev/ns-3-dev/build/debug:/usr/lib/gcc/x86_6
4-linux-gnu/4.2.4:/home/craigdo/repos/ns-3-allinone-dev/nsc/linux-2.6.18:/home/craigdo/repos/ns-3-allinone-dev/nsc/linux-2.6.26:
/home/craigdo/repos/ns-3-allinone-dev/ns-3-dev/build/debug
Queue routing-aodv-regression
Launch utils/test-runner --suite=routing-aodv-regression
Synchronously execute valgrind --suppressions=/home/craigdo/repos/ns-3-allinone-dev/ns-3-dev/testpy.supp --leak-check=full --err
or-exitcode=2 /home/craigdo/repos/ns-3-allinone-dev/ns-3-dev/build/debug/utils/test-runner --suite=routing-aodv-regression --bas
edir=/home/craigdo/repos/ns-3-allinone-dev/ns-3-dev --tempdir=testpy-output/2010-01-12-23-35-45-CUT --out=testpy-output/2010-01-
12-23-35-45-CUT/routing-aodv-regression.xml
Return code =  2
stderr =  ==27520== Memcheck, a memory error detector
==27520== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==27520== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==27520== Command: /home/craigdo/repos/ns-3-allinone-dev/ns-3-dev/build/debug/utils/test-runner --suite=routing-aodv-regression
--basedir=/home/craigdo/repos/ns-3-allinone-dev/ns-3-dev --tempdir=testpy-output/2010-01-12-23-35-45-CUT --out=testpy-output/201
0-01-12-23-35-45-CUT/routing-aodv-regression.xml
==27520==
==27520==
==27520== HEAP SUMMARY:
==27520==     in use at exit: 17,680 bytes in 156 blocks
==27520==   total heap usage: 25,471 allocs, 25,315 frees, 1,612,731 bytes allocated
==27520==
==27520== 17,480 (224 direct, 17,256 indirect) bytes in 2 blocks are definitely lost in loss record 70 of 70
==27520==    at 0x4C2397E: operator new(unsigned long) (vg_replace_malloc.c:220)
==27520==    by 0x574AE40: ns3::Ptr<ns3::Node> ns3::CreateObject<ns3::Node>() (object.h:515)
==27520==    by 0x5B2DE79: ns3::NodeContainer::Create(unsigned int) (node-container.cc:96)
==27520==    by 0x5A071C7: ns3::aodv::ChainRegressionTest::CreateNodes() (aodv-regression.cc:112)
==27520==    by 0x5A08344: ns3::aodv::ChainRegressionTest::DoRun() (aodv-regression.cc:90)
==27520==    by 0x54BFEFF: ns3::TestCase::Run() (test.cc:152)
==27520==    by 0x54C069B: ns3::TestSuite::DoRun() (test.cc:684)
==27520==    by 0x54BFBE1: ns3::TestSuite::Run() (test.cc:459)
==27520==    by 0x4026A8: main (test-runner.cc:263)

==27520== LEAK SUMMARY:
==27520==    definitely lost: 224 bytes in 2 blocks
==27520==    indirectly lost: 17,256 bytes in 150 blocks
==27520==      possibly lost: 0 bytes in 0 blocks
==27520==    still reachable: 200 bytes in 4 blocks
==27520==         suppressed: 0 bytes in 0 blocks
==27520== Reachable blocks (those to which a pointer was found) are not shown.
==27520== To see them, rerun with: --leak-check=full --show-reachable=yes
==27520==
==27520== For counts of detected and suppressed errors, rerun with: -v
==27520== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 3 from 3)

...
Comment 1 Faker Moatamri 2010-01-14 11:46:34 UTC
Actually this is not only for routing aodv regression:

utils/test-runner --suite=routing-aodv-regression
still reachable: 936 bytes in 7 blocks

utils/test-runner --suite=routing-aodv
still reachable: 280 bytes in 6 blocks

utils/test-runner --suite=routing-olsr-regression
still reachable: 200 bytes in 4 blocks

utils/test-runner --suite=routing-olsr-header
definitely lost: 224 bytes in 2 blocks.
indirectly lost: 17,256 bytes in 150 blocks.
possibly lost: 0 bytes in 0 blocks.
still reachable: 200 bytes in 4 blocks.

utils/test-runner --suite=ipv6-protocol
still reachable: 200 bytes in 4 blocks

utils/test-runner --suite=packetbb-test-suite
still reachable: 5,816 bytes in 42 blocks

utils/test-runner --suite=drop-tail-queue
still reachable: 720 bytes in 16 blocks

utils/test-runner --suite=packet-metadata
still reachable: 936 bytes in 7 blocks

utils/test-runner --suite=buffer
200 bytes in 4 blocks

utils/test-runner --suite=object-name-service
still reachable: 200 bytes in 4 blocks

examples/stats/wifi-example-sim
still reachable: 200 bytes in 4 blocks

examples/tcp/star
still reachable: 200 bytes in 4 blocks

and there is some others like those plus valgrind is returning 0 which doesn't allow us to detect the errors. I think this is more than just an error in a test program. Any thoughts?
Comment 2 Craig Dowell 2010-01-14 14:16:33 UTC
> examples/tcp/star
> still reachable: 200 bytes in 4 blocks
>
> and there is some others like those plus valgrind is returning 0 
> which doesn't allow us to detect the errors. I think this is more
> than just an error in a test program. Any thoughts?

This is expected behavior.  It has been like this since ns-3.1 and is due to the fact that valgrind doesn't consider still-reachable an "important" error since "such blocks don't need direct fixing by the programmer."
Comment 3 Kirill Andreev 2010-01-15 03:13:30 UTC
Created attachment 723 [details]
commented ping

If I comment installing ping, valgrind is happy
Comment 4 Kirill Andreev 2010-01-15 03:57:48 UTC
Created attachment 724 [details]
Proposed fix

Just stop the ping a nanosecond before end regression test ends
Comment 5 Mathieu Lacage 2010-01-15 04:33:40 UTC
(In reply to comment #4)
> Created an attachment (id=724) [details]
> Proposed fix
> 
> Just stop the ping a nanosecond before end regression test ends

Woow. Why is this fixing the leak ??? Is it because the Socket::Close function is doing something special ? If so, what ?
Comment 6 Kirill Andreev 2010-01-15 06:18:16 UTC
StopApplication is already not called, and RawSocket is not closed
Comment 7 Kirill Andreev 2010-01-15 06:20:28 UTC
You schedule stopevent for application like this:
 m_stopEvent = Simulator::Schedule (m_stopTime, &Application::StopApplication, this);
 and Simulator may be already destroyed
Comment 8 Faker Moatamri 2010-01-15 06:49:58 UTC
Created attachment 725 [details]
Clear the list of sockets in ipv4-l3-protocol.cc

This fix will clear the memory leak, actually in ipv4-l3-protocol.cc, the DoDispose function doesn't clear the list of sockets it has, that's what is causing the memory leak. Here is a patch that fixes it.
Comment 9 Craig Dowell 2010-01-15 18:11:23 UTC
I verified that clearing m_sockets cleans up the following errors:

==7504==    definitely lost: 224 bytes in 2 blocks
==7504==    indirectly lost: 17,256 bytes in 150 blocks

Another error remains after this patch is applied:

==7504==    still reachable: 200 bytes in 4 blocks

I have filed a separate bug on these, so it seems to me that applying the patch above should close this particular bug.

I'm a little troubled that something as blatant as this would not appear somewhere else, though; so maybe this is all related.
Comment 10 Faker Moatamri 2010-01-18 04:03:03 UTC
Changeset: 8f94a0ca3964