Bug 1549 - test whether NS_LOG="*" works
test whether NS_LOG="*" works
Status: RESOLVED FIXED
Product: ns-3
Classification: Unclassified
Component: core
ns-3-dev
PC All
: P1 blocker
Assigned To: Mitch Watrous
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-12-11 23:16 UTC by Alex Afanasyev
Modified: 2013-01-14 12:42 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Afanasyev 2012-12-11 23:16:43 UTC
I'm not sure if people are using NS_LOG=* option very often (I tend to use it when I'm not sure which log module/modules I need), but in the current code, that would immediately cause segfault.

The main (and the only) reason for that is that the logging routines are enabled in simulator implementation (e.g., DefaultSimulatorImpl/default-simulator-impl.cc).  What is happening is that by default log is printed with time context, which calles Simulator::Now (), which calls NS_LOG_FUNCTION, which calls Simualtor::Now () again, and so on.

Unless some special handling is done, I would suggest to remove all logging from all simulator implementations.  At least for the release version of the simulator.
Comment 1 Daniel L. 2012-12-11 23:36:02 UTC
I also encountered the same problem with NS_LOG=*
Comment 2 Tom Henderson 2012-12-11 23:58:57 UTC
It might suffice to disable logging in implementations of Simulator::Now() and all that it recurses into (mainly, TimeStep).

Another alternative might be to substitute a special logging macro for these files, rather than remove all logging (which is useful for debugging).
Comment 3 Mathieu Lacage 2012-12-12 04:47:23 UTC
That segfault was fixed in changeset 8934b7c0c1cb
Comment 4 Tom Henderson 2012-12-12 13:01:38 UTC
stack is still overflowing; to reproduce:

NS_LOG="*" ./waf --command-template="valgrind %s" --run first
Comment 5 Tom Henderson 2012-12-12 13:16:22 UTC
The current stack overflows are due to overzealous logging of various constructors introduced in recent logging patches to core and network modules.

e.g. this is one necessary change:

diff -r 8ef8d8bae350 src/network/utils/ipv6-address.cc
--- a/src/network/utils/ipv6-address.cc	Wed Dec 12 04:50:46 2012 -0500
+++ b/src/network/utils/ipv6-address.cc	Wed Dec 12 10:12:25 2012 -0800
@@ -258,13 +258,17 @@
 
 Ipv6Address::Ipv6Address (Ipv6Address const& addr)
 {
+/*
   NS_LOG_FUNCTION (this << addr);
+*/
   memcpy (m_address, addr.m_address, 16);
 }
 
 Ipv6Address::Ipv6Address (Ipv6Address const* addr)
 {
+/*
   NS_LOG_FUNCTION (this << addr);
+*/
   memcpy (m_address, addr->m_address, 16);
 }

but there are others in the typeId and attribute system now.

Mathieu removed logging in the time class to resolve the overflow originally reported.
Comment 6 Mathieu Lacage 2012-12-12 14:06:34 UTC
(In reply to comment #5)
> The current stack overflows are due to overzealous logging of various
> constructors introduced in recent logging patches to core and network
> modules.

the style police won a battle :)

When I debugged this and looked at the logging patch, I thought: "but why do they want to make the debug builds slower than they are to log things that no one will ever want to see". I did not feel like trying to undo the most obviously overzealous changes so I suspect you have 2 options:

  - revert the original logging patch

  - remove the log statements that make ns-3 crash until all examples and tests pass with NS_LOG="*"
Comment 7 Tom Henderson 2012-12-12 14:19:12 UTC
I recommend these steps to resolve this:

1) remove all log statements leading to stack overflow.  The TypeId/Attribute logging is most suspect right now, but others may pop up later.

NS_LOG="*" ./waf --command-template="valgrind %s" --run first

Comment out similar to this changeset:  http://code.nsnam.org/ns-3-dev/rev/7c80515eab0c

2) once the "first" program is running cleanly, test our other examples (manually) that run under test.py by running similar to above.  Not sure if these will uncover more problems.

Once these steps are done, we can remove the blocker status, but there is still more work to do for long term.

3) The initialization output is currently very noisy:

NS_LOG="*" ./waf --run sample-simulator
NS_LOG="*" ./waf --run first

Are there certain log statements that are just filling up the output with lots of noise?  For instance, all of the Rng output in sample-simulator...

Let's try to disable some additional logging statements that are just adding to the noise.

4) We need an automated way to run a test of how users run a program, with environment variables.  At a minimum, I'd like to add a test that would e.g. run this program

NS_LOG="*" ./waf --run first

in a process and check that it doesn't segfault.  For example, if there were a wrapper program such as

  check-first.py

that imported subprocess module and did something like:

subprocess.call(['NS_LOG="*" ./waf --run first'])

and checked for success.  Can such a program be added as a test (e.g. to src/test), not as an example?

Beyond that, we could possibly enable a way for test.py to run all tests with NS_LOG  (or other environment variables).   Another use of this would be to run the test suite with different global variables for RngRun and RngSeed.

(before doing this, please test whether test.py inherits this and applies environment variables already -- it may just already be supported by passing things to existing test.py)
Comment 8 Tom Henderson 2012-12-12 14:21:04 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > The current stack overflows are due to overzealous logging of various
> > constructors introduced in recent logging patches to core and network
> > modules.
> 
> the style police won a battle :)
> 
> When I debugged this and looked at the logging patch, I thought: "but why do
> they want to make the debug builds slower than they are to log things that
> no one will ever want to see". I did not feel like trying to undo the most
> obviously overzealous changes so I suspect you have 2 options:
> 
>   - revert the original logging patch
> 
>   - remove the log statements that make ns-3 crash until all examples and
> tests pass with NS_LOG="*"

I'm interested in the second option.  Finding the sweet spot is going to be subjective, but I think it will help in the long run to have more logging in the core of ns-3 for future debugging, so I'd like to trim back rather than just revert.
Comment 9 Tom Henderson 2012-12-13 01:45:03 UTC
changeset f094300690db allows all of the tutorial programs to run successfully with NS_LOG="*"

More work is needed to audit for further log statements that should be removed, because there are likely additional cases that are not being exercised by the tutorial code.  In general, logging should be avoided for operators, copy constructors, and very high use functions.
Comment 10 Tom Henderson 2012-12-15 14:35:38 UTC
leaving open (at lower priority) for completing a test to catch future occurrences of this (see comment #7).

other issues of segfault are addressed as of changeset a420dcf7dceb
Comment 11 Mitch Watrous 2012-12-17 13:53:55 UTC
Here is what test.py is doing with regard to logging:

    #
    # If lots of logging is enabled, we can crash Python when it tries to 
    # save all of the text.  We just don't allow logging to be turned on when
    # test.py runs.  If you want to see logging output from your tests, you
    # have to run them using the test-runner directly.
    #
    os.environ["NS_LOG"] = ""

So, it will not be possible to have test.py run all tests with NS_LOG="*".

I will work on an automated test to run an example with all logging turned on.
Comment 12 Mitch Watrous 2012-12-17 14:24:17 UTC
Would you want the test suite to start a process from C++ to execute the Python script?

Do we really want this to run as a test suite that dumps out copious log messages, could take a long time, and could segfault?

What if this was a Python example in src/test/examples?
Comment 13 Mitch Watrous 2013-01-14 12:42:53 UTC
A test script has been added that runs all of ns-3's C++ examples with NS-LOG="*":

    utils/run-examples-with-full-logging.sh

There are still 5 examples that give errors when run with NS-LOG="*" :
      
    emu-ping
    emu-udp-echo
    lena-dual-stripe --epc=1 --fadingTrace=../../src/lte/model/fading-traces/fading_trace_EPA_3kmph.fad --simTime=0.01
    tap-wifi-dumbbell
    wimax-simple

Bug closed.

ns-3-dev changeset: fd38b9505eb9