Bug 677 - gcc cxxflags plays (multiple questions)
gcc cxxflags plays (multiple questions)
Status: RESOLVED FIXED
Product: ns-3
Classification: Unclassified
Component: build system
ns-3-dev
All All
: P4 enhancement
Assigned To: ns-bugs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-09-17 11:19 UTC by Andrey Mazo
Modified: 2009-10-23 11:48 UTC (History)
2 users (show)

See Also:


Attachments
Add "release" profile (see == First ==) (323 bytes, patch)
2009-09-17 11:21 UTC, Andrey Mazo
Details | Diff
Add option "--enable-strip" (see == First ==) (1.64 KB, patch)
2009-09-17 11:23 UTC, Andrey Mazo
Details | Diff
Respect CXXFLAGS_EXTRA (see == Second ==) (803 bytes, patch)
2009-09-17 11:24 UTC, Andrey Mazo
Details | Diff
gcc -pipe (see == Third ==) (474 bytes, patch)
2009-09-17 11:24 UTC, Andrey Mazo
Details | Diff
gcc -fomit-frame-pointer (see == Fourth ==) (453 bytes, patch)
2009-09-17 11:27 UTC, Andrey Mazo
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andrey Mazo 2009-09-17 11:19:52 UTC
I have several questions/proposals about compiler flags.
I've decided not to create several bugs and keep everything in one place.

== First ==
gcc -g option makes executables 5 times larger in static optimized version.

What's the reason in passing -g option to gcc in optimized profile?
Is it required for valgrind checks?
Of course, it doesn't influence on memory footprint and must not significantly change computation speed.
But anyway this debugging information is absolutely useless during long production simulations and only consumes hard drive space.
I see 2 ways:
1) add a new profile like "release" without debugging info at all
2) add a new configure option "--enable-strip" to run gcc with -s flag (or equivalent flag for other compilers).


== Second ==
Make wscript pick up environmental variables like CXXFLAGS_EXTRA to append them to automatically assembled CXXFLAGS (this may be useful for some compilation related experiments).

== Third ==
Add gcc "-pipe" option not to create temporary files (this will slightly speed up compilation) (must be supported even under cygwin)

== Fourth ==
Add gcc "-fomit-frame-pointer" option.
This will slightly speed up execution (I've got about 1-3% speed up) and reduce code size, but make debugging impossible on several architectures.
So I think it is acceptable under "release" profile (from First proposal).

== Fifth ==
Add gcc "-march=native" option.
Though it doesn't have any significant performance gains (I think due to intensive memory operations and bad code/data locality), it may be valuable for future models with intensive calculations.
Comment 1 Andrey Mazo 2009-09-17 11:21:32 UTC
Created attachment 586 [details]
Add "release" profile (see == First ==)
Comment 2 Andrey Mazo 2009-09-17 11:23:13 UTC
Created attachment 587 [details]
Add option "--enable-strip" (see == First ==)
Comment 3 Andrey Mazo 2009-09-17 11:24:12 UTC
Created attachment 588 [details]
Respect CXXFLAGS_EXTRA (see == Second ==)
Comment 4 Andrey Mazo 2009-09-17 11:24:57 UTC
Created attachment 589 [details]
gcc -pipe (see == Third ==)
Comment 5 Andrey Mazo 2009-09-17 11:27:22 UTC
Created attachment 590 [details]
gcc -fomit-frame-pointer (see == Fourth ==)
Comment 6 Gustavo J. A. M. Carneiro 2009-09-17 13:32:08 UTC
(In reply to comment #0)
> I have several questions/proposals about compiler flags.
> I've decided not to create several bugs and keep everything in one place.
> 
> == First ==
> gcc -g option makes executables 5 times larger in static optimized version.
> 
> What's the reason in passing -g option to gcc in optimized profile?
> Is it required for valgrind checks?

For profiling purposes (cachegrind or memprof, for example), you want your code to be both optimized and also to contain full debugging information.

> Of course, it doesn't influence on memory footprint and must not significantly
> change computation speed.
> But anyway this debugging information is absolutely useless during long
> production simulations and only consumes hard drive space.
> I see 2 ways:
> 1) add a new profile like "release" without debugging info at all
> 2) add a new configure option "--enable-strip" to run gcc with -s flag (or
> equivalent flag for other compilers).

The two ways are mutually exclusive, right?  I don't see the point of stripping debug symbols; might as well not produce them in the first place!

But I am +1 on your proposed 'release' profile.

> 
> 
> == Second ==
> Make wscript pick up environmental variables like CXXFLAGS_EXTRA to append them
> to automatically assembled CXXFLAGS (this may be useful for some compilation
> related experiments).

Meh.. is this really so much useful?  What is wrong about using CXXFLAGS env var. to completely override the default flags?

> 
> == Third ==
> Add gcc "-pipe" option not to create temporary files (this will slightly speed
> up compilation) (must be supported even under cygwin)

gcc "-pipe", does it really much difference?  If it's always so good an option, why doesn't gcc use it by default?

> 
> == Fourth ==
> Add gcc "-fomit-frame-pointer" option.
> This will slightly speed up execution (I've got about 1-3% speed up) and reduce
> code size, but make debugging impossible on several architectures.
> So I think it is acceptable under "release" profile (from First proposal).

+0 (I abstain)

> 
> == Fifth ==
> Add gcc "-march=native" option.
> Though it doesn't have any significant performance gains (I think due to
> intensive memory operations and bad code/data locality), it may be valuable for
> future models with intensive calculations.
> 

-march=native appears to be a good idea considering that ns-3 is not ever installed or packaged for distributions.
Comment 7 Andrey Mazo 2009-09-17 18:29:46 UTC
(In reply to comment #6)

Thank you for your quick reply!

> > What's the reason in passing -g option to gcc in optimized profile?
> > Is it required for valgrind checks?
> 
> For profiling purposes (cachegrind or memprof, for example), you want your code
> to be both optimized and also to contain full debugging information.
Thank you, I understand now.

> > I see 2 ways:
> > 1) add a new profile like "release" without debugging info at all
> > 2) add a new configure option "--enable-strip" to run gcc with -s flag (or
> > equivalent flag for other compilers).
> 
> The two ways are mutually exclusive, right?  I don't see the point of stripping
> debug symbols; might as well not produce them in the first place!
Yes, I see no reason in implementing them both.
I like the first way more too, because it allows to add some other flags, that may conflict with profiling, for example.
The second way is just an alternative.
And I know some packages behaving that way (compile with -g3 -ggdb, but then strip the final executable).

> But I am +1 on your proposed 'release' profile.
Good!
Waiting for one more +1 from another maintainer?


> > == Second ==
> > Make wscript pick up environmental variables like CXXFLAGS_EXTRA to append them
> > to automatically assembled CXXFLAGS (this may be useful for some compilation
> > related experiments).
> 
> Meh.. is this really so much useful?  What is wrong about using CXXFLAGS env
> var. to completely override the default flags?
Well, I'm not sure, that this will be very useful for end-users, because it's rather special use case.
But it can ease some plays with compiler flags, temporary defines or so.
CXXFLAGS are carefully assembled throughout the whole configure() in wscript, so it's very unwisely to drop them.
Blindly overriding LINKFLAGS may be disastrous.

> > == Third ==
> > Add gcc "-pipe" option not to create temporary files (this will slightly speed
> > up compilation) (must be supported even under cygwin)
> 
> gcc "-pipe", does it really much difference?  If it's always so good an option,
> why doesn't gcc use it by default?
Gcc doesn't enable many good options by default.:)
I don't think, the difference is really measurable because of filesystem buffers and caches.
There may be more noticeable improvement in case of many small C files, not several large C++ ones.
But why should we produce additional overhead?

> > == Fourth ==
> > Add gcc "-fomit-frame-pointer" option.
> > This will slightly speed up execution (I've got about 1-3% speed up) and reduce
> > code size, but make debugging impossible on several architectures.
> > So I think it is acceptable under "release" profile (from First proposal).
> 
> +0 (I abstain)
Any unbiassed reasons?
Debugging under "release" profile will be already hard to impossible.

> > == Fifth ==
> > Add gcc "-march=native" option.
> > Though it doesn't have any significant performance gains (I think due to
> > intensive memory operations and bad code/data locality), it may be valuable for
> > future models with intensive calculations.
>
> -march=native appears to be a good idea considering that ns-3 is not ever
> installed or packaged for distributions.
Good!
Again, waiting for another +1?
Comment 8 Andrey Mazo 2009-09-18 09:28:42 UTC
(In reply to comment #7)
> > gcc "-pipe", does it really much difference?  If it's always so good an option,
> > why doesn't gcc use it by default?
> Gcc doesn't enable many good options by default.:)
> I don't think, the difference is really measurable because of filesystem
> buffers and caches.
> There may be more noticeable improvement in case of many small C files, not
> several large C++ ones.
> But why should we produce additional overhead?

I've made a special synthetic benchmark.
In short, the reduction of system CPU time (reported by time(1)) is about 20% (reduction of NS-3 compilation time is by less than 1%).
In detail, the benchmark is the following:
1) large simple preprocessed C file (actually an XPM image about 2.6M)
2) 1000 iterations with /bin/false to measure shell forking and looping overhead
3) "gcc -c file.i" to ensure the file is in cache
4) 1000 iterations with gcc -c file.i
5) 1000 iterations with gcc -c file.i -pipe

Results (one run -- others are similar):
2) real    0m0.519s
   user    0m0.184s
   sys     0m0.324s

4) real    2m48.458s
   user    2m6.020s
   sys     0m33.138s

5) real    2m30.898s
   user    2m4.784s
   sys     0m26.182s

So, I think, that overall NS-3 compilation time will be reduced by less than 1%. (rough estimate confirms this)
Comment 9 Gustavo J. A. M. Carneiro 2009-09-25 07:37:37 UTC
Here are some interesting patches, but NS-3 is in feature freeze, so they are better used after the NS-3.6 release, to avoid potentially breaking the release.
Comment 10 Andrey Mazo 2009-09-25 09:06:01 UTC
(In reply to comment #9)
> Here are some interesting patches, but NS-3 is in feature freeze, so they are
> better used after the NS-3.6 release, to avoid potentially breaking the
> release.

No problem.
I'm doing some more benchmarks, so I can benefit from the time before open phase.:)
Comment 11 Andrey Mazo 2009-09-28 09:34:22 UTC
== Sixth ==
(depends on Fifth)
Instruct gcc to use SSE where possible: add gcc options like "-msse{,2,3} -mfpmath=sse".

My measures show that this won't give any speed improvements, but may change precision for long simulations due to rounding problems.

The gcc manual says (about -mfpmath=sse):
"The resulting code should be considerably faster in the majority of cases and avoid the numerical instability problems of 387 code, but may break some existing code that expects temporaries to be 80bit."

For example, running prolonged wifi-wired-bridging example compiled with and without sse, gives, that pcap traces are equal, but mobility traces differs in some places. The difference isn't significant (about 2ns), but may affect strongly some special simulations.
Comment 12 Gustavo J. A. M. Carneiro 2009-10-23 08:32:23 UTC
The tree is now open for new enhancements.  If you have updated versions of the patches, time to upload them...
Comment 13 Andrey Mazo 2009-10-23 08:46:19 UTC
(In reply to comment #12)
> The tree is now open for new enhancements.  If you have updated versions of the
> patches, time to upload them...
Some my planned enhancements didn't showed any performance gains and only add more complicity to wscript, so I don't have any brand new patches.
Comment 14 Gustavo J. A. M. Carneiro 2009-10-23 08:56:52 UTC
Comment on attachment 586 [details]
Add "release" profile (see == First ==)

Feel free to commit this patch.
Comment 15 Gustavo J. A. M. Carneiro 2009-10-23 08:58:40 UTC
Comment on attachment 587 [details]
Add option "--enable-strip" (see == First ==)

The 'release' profile replaces this patch.
Comment 16 Gustavo J. A. M. Carneiro 2009-10-23 09:00:21 UTC
Comment on attachment 588 [details]
Respect CXXFLAGS_EXTRA (see == Second ==)

Please add support for a CCFLAGS_EXTRA, besides CXXFLAGS_EXTRA.  Even if ns-3-dev does not use pure C code, I know at least Mathieu is working on pure C code in a branch.  And for consistency.  After this change, feel free to commit.
Comment 17 Gustavo J. A. M. Carneiro 2009-10-23 09:02:21 UTC
Comment on attachment 589 [details]
gcc -pipe (see == Third ==)

I am still unconvinced about this patch.  Why bother if the difference is so small?  I say let's drop it.
Comment 18 Gustavo J. A. M. Carneiro 2009-10-23 09:13:39 UTC
-fomit-frame-pointer and -march=native, ok, I guess, but only for the 'release' profile.
Comment 19 Andrey Mazo 2009-10-23 09:36:48 UTC
(In reply to comment #14)
> (From update of attachment 586 [details])
> Feel free to commit this patch.
changeset 15524c57a627
Comment 20 Andrey Mazo 2009-10-23 09:37:16 UTC
(In reply to comment #16)
> (From update of attachment 588 [details])
> Please add support for a CCFLAGS_EXTRA, besides CXXFLAGS_EXTRA.  Even if
> ns-3-dev does not use pure C code, I know at least Mathieu is working on pure C
> code in a branch.  And for consistency.  After this change, feel free to
> commit.
changeset 6db6a279dfff
Comment 21 Andrey Mazo 2009-10-23 09:37:50 UTC
(In reply to comment #18)
> -fomit-frame-pointer and -march=native, ok, I guess, but only for the 'release'
> profile.
changeset 8878efe25b6c
Comment 22 Andrey Mazo 2009-10-23 09:41:17 UTC
(In reply to comment #17)
> (From update of attachment 589 [details])
> I am still unconvinced about this patch.  Why bother if the difference is so
> small?  I say let's drop it.
Well, seems you're right here, thus no one else finds this helpful.
Closing bug as FIXED?