Bugzilla – Bug 1549
test whether NS_LOG="*" works
Last modified: 2013-01-14 12:42:53 UTC
I'm not sure if people are using NS_LOG=* option very often (I tend to use it when I'm not sure which log module/modules I need), but in the current code, that would immediately cause segfault. The main (and the only) reason for that is that the logging routines are enabled in simulator implementation (e.g., DefaultSimulatorImpl/default-simulator-impl.cc). What is happening is that by default log is printed with time context, which calles Simulator::Now (), which calls NS_LOG_FUNCTION, which calls Simualtor::Now () again, and so on. Unless some special handling is done, I would suggest to remove all logging from all simulator implementations. At least for the release version of the simulator.
I also encountered the same problem with NS_LOG=*
It might suffice to disable logging in implementations of Simulator::Now() and all that it recurses into (mainly, TimeStep). Another alternative might be to substitute a special logging macro for these files, rather than remove all logging (which is useful for debugging).
That segfault was fixed in changeset 8934b7c0c1cb
stack is still overflowing; to reproduce: NS_LOG="*" ./waf --command-template="valgrind %s" --run first
The current stack overflows are due to overzealous logging of various constructors introduced in recent logging patches to core and network modules. e.g. this is one necessary change: diff -r 8ef8d8bae350 src/network/utils/ipv6-address.cc --- a/src/network/utils/ipv6-address.cc Wed Dec 12 04:50:46 2012 -0500 +++ b/src/network/utils/ipv6-address.cc Wed Dec 12 10:12:25 2012 -0800 @@ -258,13 +258,17 @@ Ipv6Address::Ipv6Address (Ipv6Address const& addr) { +/* NS_LOG_FUNCTION (this << addr); +*/ memcpy (m_address, addr.m_address, 16); } Ipv6Address::Ipv6Address (Ipv6Address const* addr) { +/* NS_LOG_FUNCTION (this << addr); +*/ memcpy (m_address, addr->m_address, 16); } but there are others in the typeId and attribute system now. Mathieu removed logging in the time class to resolve the overflow originally reported.
(In reply to comment #5) > The current stack overflows are due to overzealous logging of various > constructors introduced in recent logging patches to core and network > modules. the style police won a battle :) When I debugged this and looked at the logging patch, I thought: "but why do they want to make the debug builds slower than they are to log things that no one will ever want to see". I did not feel like trying to undo the most obviously overzealous changes so I suspect you have 2 options: - revert the original logging patch - remove the log statements that make ns-3 crash until all examples and tests pass with NS_LOG="*"
I recommend these steps to resolve this: 1) remove all log statements leading to stack overflow. The TypeId/Attribute logging is most suspect right now, but others may pop up later. NS_LOG="*" ./waf --command-template="valgrind %s" --run first Comment out similar to this changeset: http://code.nsnam.org/ns-3-dev/rev/7c80515eab0c 2) once the "first" program is running cleanly, test our other examples (manually) that run under test.py by running similar to above. Not sure if these will uncover more problems. Once these steps are done, we can remove the blocker status, but there is still more work to do for long term. 3) The initialization output is currently very noisy: NS_LOG="*" ./waf --run sample-simulator NS_LOG="*" ./waf --run first Are there certain log statements that are just filling up the output with lots of noise? For instance, all of the Rng output in sample-simulator... Let's try to disable some additional logging statements that are just adding to the noise. 4) We need an automated way to run a test of how users run a program, with environment variables. At a minimum, I'd like to add a test that would e.g. run this program NS_LOG="*" ./waf --run first in a process and check that it doesn't segfault. For example, if there were a wrapper program such as check-first.py that imported subprocess module and did something like: subprocess.call(['NS_LOG="*" ./waf --run first']) and checked for success. Can such a program be added as a test (e.g. to src/test), not as an example? Beyond that, we could possibly enable a way for test.py to run all tests with NS_LOG (or other environment variables). Another use of this would be to run the test suite with different global variables for RngRun and RngSeed. (before doing this, please test whether test.py inherits this and applies environment variables already -- it may just already be supported by passing things to existing test.py)
(In reply to comment #6) > (In reply to comment #5) > > The current stack overflows are due to overzealous logging of various > > constructors introduced in recent logging patches to core and network > > modules. > > the style police won a battle :) > > When I debugged this and looked at the logging patch, I thought: "but why do > they want to make the debug builds slower than they are to log things that > no one will ever want to see". I did not feel like trying to undo the most > obviously overzealous changes so I suspect you have 2 options: > > - revert the original logging patch > > - remove the log statements that make ns-3 crash until all examples and > tests pass with NS_LOG="*" I'm interested in the second option. Finding the sweet spot is going to be subjective, but I think it will help in the long run to have more logging in the core of ns-3 for future debugging, so I'd like to trim back rather than just revert.
changeset f094300690db allows all of the tutorial programs to run successfully with NS_LOG="*" More work is needed to audit for further log statements that should be removed, because there are likely additional cases that are not being exercised by the tutorial code. In general, logging should be avoided for operators, copy constructors, and very high use functions.
leaving open (at lower priority) for completing a test to catch future occurrences of this (see comment #7). other issues of segfault are addressed as of changeset a420dcf7dceb
Here is what test.py is doing with regard to logging: # # If lots of logging is enabled, we can crash Python when it tries to # save all of the text. We just don't allow logging to be turned on when # test.py runs. If you want to see logging output from your tests, you # have to run them using the test-runner directly. # os.environ["NS_LOG"] = "" So, it will not be possible to have test.py run all tests with NS_LOG="*". I will work on an automated test to run an example with all logging turned on.
Would you want the test suite to start a process from C++ to execute the Python script? Do we really want this to run as a test suite that dumps out copious log messages, could take a long time, and could segfault? What if this was a Python example in src/test/examples?
A test script has been added that runs all of ns-3's C++ examples with NS-LOG="*": utils/run-examples-with-full-logging.sh There are still 5 examples that give errors when run with NS-LOG="*" : emu-ping emu-udp-echo lena-dual-stripe --epc=1 --fadingTrace=../../src/lte/model/fading-traces/fading_trace_EPA_3kmph.fad --simTime=0.01 tap-wifi-dumbbell wimax-simple Bug closed. ns-3-dev changeset: fd38b9505eb9