Bugzilla – Full Text Bug Listing |
Summary: | Interference Helper gives negative trace value for Interference Power | ||
---|---|---|---|
Product: | ns-3 | Reporter: | Tom Henderson <tomh> |
Component: | wifi | Assignee: | sebastien.deronne |
Status: | RESOLVED FIXED | ||
Severity: | major | CC: | ns-bugs, pdbarnes |
Priority: | P3 | ||
Version: | ns-3.27 | ||
Hardware: | All | ||
OS: | All | ||
See Also: | https://www.nsnam.org/bugzilla/show_bug.cgi?id=2847 | ||
Attachments: |
proposed solution
proposed solution Suggested revised patch Suggested revised patch v2 updated patch latest solution with replacement of vector by multimap final patch Revised patch, keeps the first zero power noise event benchmark program |
Description
Tom Henderson
2017-10-05 10:09:42 UTC
Can I get a script to reproduce the issue? Update from Peter Barnes: The Wifi InterferenceHelper, as well as the Spectrum model SpectrumInterference class, are vulnerable to roundoff, resulting in negative computed interference (Ix), and hence negative SINR, which triggers an assert. There have been two recent reports on the users' list: https://groups.google.com/forum/#!topic/ns-3-users/abB_fM7C4rs https://groups.google.com/forum/#!topic/ns-3-users/t6L35V_eZ8M It's not clear either user has a repeatable example :( The second thread traces it back to wifi/model/interference-helper.cc:243 InterferenceHelper::CalculateSnr(). It looks like Ix sources are being tracked in a list of deltas, stored as NiChange events. Look at 220:AppendEvent() for example: - it adds (or modifies, lines 230-231) a new NiChange: 235: NiChange (event->GetStartTime (), event->GetRxPowerW (), event) - it adds a decrement: 237: NiChange (event->GetEndTime (), -event->GetRxPowerW (), event) The callers of CalculateSnr() walk the list of NiChange, adding in the delta power stored in each NiChange, to compute the total Ix power. When the relevant span includes both ends (start and end) of an Ix event, the design of storing (startTime, +power) then (endTime, -power) leads to adding and subtracting the same value. In between these addition and subtraction the absolute value of Ix being calculated can have changed significantly larger, leading to roundoff errors in the subtraction, ultimately resulting in negative values. A quick look suggests SpectrumInterference has the same approach. To prove that this is the problem a couple of things might help: 1. Instrument to record the high water mark as the Ix is being computed. This should be significantly larger than the final (negative) Ix value, which lends credence to loss of precision hypothesis. 2. Check the end result Ix value, if it's negative set it to m_firstPower (it's initial value). This should be a simple two line patch which either user could test for us. #2 is just a hack, not the real fix. We shouldn't settle for this hack a. because it could mask other more serious issues in the future, and b. because there's a better fix. The real fix is to refactor NiChange from (time, delta) to (time, power). To add a new Ix record, 1. add two NiChange events for start time and and end time of the new Ix source. 2. For the end time NiChange, set the power to the power of the previous entry (in time). 3. For all events in the interval [start, end) (including the new start event, but not the new end event) set power to the power of the entry before the new event. Step 3 has the effect of adding the new Ix power to the old value, starting at the start time. Step 2 has the effect of restoring the power to what it would have been, in the absence of the new Ix source, at the end time. Notice that this algorithm *does not* require any subtraction, only addition. *** Bug 2845 has been marked as a duplicate of this bug. *** Peter, thanks for the information, I like the idea to avoid subtractions. In any case, a reproduction script is required to test the solution. Sorry for duplicate post. Cranky hotel network is partly to blame. Since the same code is used in wifi and spectrum, Since the same code is used in wifi and spectrum (and bugzilla can't tag two components in the same bug report) I DID intend to create 2847 as a placeholder against spectrum. This patch can reproduce the condition: diff -r 71da905a6eb9 src/wifi/model/interference-helper.cc --- a/src/wifi/model/interference-helper.cc Wed Jan 10 19:29:50 2018 +0100 +++ b/src/wifi/model/interference-helper.cc Wed Jan 10 15:31:45 2018 -0800 @@ -287,6 +287,7 @@ } ni->insert (ni->begin (), NiChange (event->GetStartTime (), noiseInterference, event)); ni->push_back (NiChange (event->GetEndTime (), 0, event)); + NS_ABORT_MSG_IF (noiseInterference < 0, "Noise interference < 0 :" << noiseInterference); return noiseInterference; } ./waf --run 'power-adaptation-interference --simuTime=5' aborted. cond="noiseInterference < 0", msg="Noise interference < 0 :-2.47387e-27", file=../src/wifi/model/interference-helper.cc, line=290 terminate called without an active exception It seems that it might be benign in most of our use cases, but in the LAA code, if it arises, there is a later attempt to take the log of it which causes a nan which then causes downstream issues. Peter, Tom, Since we are 3 people looking at it right now, who is going to work on a patch? I have some time this weekend to work on a solution based on Peter's proposal. @Tom: I don't think it's benign. Just try to calculate a SINR then use an error model... @Sebastien: I won't be able to work on this, so charge ahead. The only diagnostic bit I can add is, following up on Tom's reproducible example, I added some logging to trace the evolution of m_firstPower: +1.000922702s 1 InterferenceHelper:AppendEvent(): [DEBUG] firstPower: 0, delta: 1.346294318435293e-10, time: +1000662702.0ns +1.000922702s 1 InterferenceHelper:AppendEvent(): [DEBUG] firstPower: 1.346294318435293e-10, delta: 1.346294318435293e-13, time: +1000662703.0ns +1.000922702s 1 InterferenceHelper:AppendEvent(): [DEBUG] firstPower: 1.347640612753728e-10, delta: -1.346294318435293e-10, time: +1000774702.0ns +1.000922702s 1 InterferenceHelper:AppendEvent(): [DEBUG] firstPower: 1.346294318435268e-13, delta: -1.346294318435293e-13, time: +1000906703.0ns aborted. cond="m_firstPower < 0", msg="+1.000922702sfirstPower: -2.473867798773093e-27", file=../../src/wifi/model/interference-helper.cc, line=231 Simplified, the evolution is m_firstPower delta 0 1.346294e-10 1.346294e-10 1.346294e-13 1.347640e-10 -1.346294e-10 1.346294e-13 -1.346294e-13 Notice that we: add A, add B, sub A, sub B, where B = A/1000. We lose 3 digits of precision from B in the addition when it's normalized to A's exponent, but not on the subtraction; we are not subtracting the same number as added, and in this case we end up negative. (Side note: curious that the digits in A and B are exactly the same, only the exponent is different. I haven't tried to understand this part at all, but seems weird.) Rereading my suggestion, I see I left off one important piece: folding in the new power: 3. For all events in the interval [start, end) (including the new start event, but not the new end event) set power to the power of the entry before the new event + the new Ix power Peter, thanks for your very interesting feedback that will for sure help in fixing this issue. I will try to work on a patch this weekend. (In reply to Peter Barnes from comment #9) > @Tom: I don't think it's benign. Just try to calculate a SINR then use an > error model... Sorry, poor word choice-- I meant that we are not seeing the effect in our existing use cases in our tests and examples (i.e. it is not triggering any faults which is why it has been hidden for a while)-- not that it isn't possibly causing some harm. I agree with trying your refactoring. I am facing one question: how should we handle the case where multiple events exists for a same time position? This occurs if two packets arrive at the same moment. I should post a solution this week. (In reply to sebastien.deronne from comment #13) > I am facing one question: how should we handle the case where multiple > events exists for a same time position? This occurs if two packets arrive at > the same moment. It doesn't matter to the algorithm, since the Ix events arrive (are inserted) one after the other. In other words, assume when about to insert the first event A you have Ix0 Ix1 .. Ixn Ix(n+1) where Ix0 stands for the aggregate interference power beginning at the time of event Ix0. Assume A starts between Ix0 and Ix1, and ends between Ixn and Ix(n+1): Ix0 A Ix1' .. Ixn' Ix(n+1) where 'ed values are equal to the un-primed value + the additional power of A, and A = Ix power at Ix0 + new Ix power from A. Now we need to insert Ix event B, which ties with A. It doesn't matter which one lands first: Ix0 A B' Ix1'' .. , where B' = Ix power at A + new Ix power from B Ix1'' = Ix1' + new Ix power from B Ix0 B A' Ix1'' .. , where B = Ix power at Ix0 + new Ix power from B A' = Ix power previously at A + new Ix power from B Ix1'' = Ix1' + new Ix power from B Notice that in either case Ix1'' = Ix1 + A + B. For extra credit one could contemplate merging tied events, but there's something in the code which searches for specific events (I don't recall why) which would probably struggle with merged events. I do not think we can merge, I had that in mind but I saw a lot of issues there. I am finalizing the patch right now. Created attachment 3006 [details]
proposed solution
I took some time this week to implement a solution based on Peter Barnes's proposal.
No regression test is impacted.
It would be nice if AddNiChangeEvent() returned the iterator of the new event, then the following loop at line 240 would be a lot simpler: auto first = AddNiChangeEvent (NiChange (event->GetStartTime (), previousPowerStart + event->GetRxPowerW (), event)); auto last = AddNiChangeEvent (NiChange (event->GetEndTime (), previousPowerEnd, event)); for (auto i = first; i < last; ++i) { i->AddPower (event->GetRxPowerW ()); } Otherwise +1 I love working with the ns-3 team! Created attachment 3007 [details]
proposed solution
Peter, I like a lot your idea, but I am facing an issue when I do:
auto last = AddNiChangeEvent (...)
At that moment, a new vector is reallocated (since its size is incremented), so the distance between last and first has no sense anymore.
A solution is to return the distance to the first element instead of an iterator.
Please find the updated patch in attachment.
Created attachment 3008 [details]
Suggested revised patch
I've got some picky questions about the patch. I'll post my version (untested) for comparison/comment.
First, the original choice of typedef std::vector <NiChange> NiChanges; is problematic, for a container which is supposed to be sorted. std::multimap would be much better. Should we make that change now? Several loops would disappear, as well as the GetNextPosition, GetPreviousPosition...
interference-helper.cc:222: I don't understand the conditional. If NiChanges is empty (on inserting the first event), GetPreviousEvent will return a begin which can't be dereferenced. I think this should stanza should be handling the empty-list case like this:
double previousPowerStart = 0; // or should this be m_firstPower?
auto itPreviousEvent = GetPreviousPosition (event->GetStartTime ());
if ( m_niChanges.size() > 0 ) {
// itPrevious points to a real event with start time <= event
previousPowerStart = itPreviousEvent->GetPower ();
}
Similar comment on the next stanza, with similar alteration.
interference-helper.cc:234: I had to stare to see that the new power wasn't being added twice. I'd prefer:
auto first = AddNiChangeEvent (NiChange (event->GetStartTime (), previousPowerStart, event));
auto last = AddNiChangeEvent (NiChange (event->GetEndTime (), previousPowerEnd, event));
for (auto i = first; i < last; i++)
{
m_niChanges.at (i).AddPower (event->GetRxPowerW ());
}
interference-helper.cc:280: this loop exits on the first iteration
for (NiChanges::const_iterator i = eventIterator + 1; i != m_niChanges.end (); ++i)
interference-helper.cc:364 and
noiseInterferenceW = j->GetPower () - powerW;
The NiChange j has the total received power, so we subtract the signal power to get the noise?
interference-helper.cc:870: simplify using GetPreviousPosition:
m_rxing = false;
//Update m_firstPower for frame capture
auto it = GetPreviousPosition (Simulator::Now ());
m_firstPower = it->GetPower ();
Peter, I see you still have GetDelta calls in your patch, is it a mistake? Also, this cannot work: auto first = AddNiChangeEvent (NiChange (event->GetStartTime (), previousPowerStart, event)); auto last = AddNiChangeEvent (NiChange (event->GetEndTime (), previousPowerEnd, event)); for (auto i = first; i < last; i++) { m_niChanges.at (i).AddPower (event->GetRxPowerW ()); } Because when we insert the event, it is reallocated so we need to use the distance instead as in my latest patch. Whoops, looks like I applied to an old version. I can't seem to ssh to/from code.nsnam.org ATM, probably our firewall. first and last are just indices into the vector, not pointer-style iterators, so first isn't invalidated by inserting last into the vector. If you don't like this, all the more reason to switch to multi_map. Multimap seems indeed appealing here :-) Can you please provide your updated patch? Normally, you should be able to pull from ns-3-dev. Created attachment 3009 [details]
Suggested revised patch v2
Try this. Still not based on current ns-3-dev, so the line numbers are probably wrong, but no more references to delta.
Uncompiled or tested.
Peter, I have plenty of issues with the new patch, I fixed compilation issues but tests are failing or crashing. I think it might be easier if you point the difference between our patches so that I can continue. (In reply to sebastien.deronne from comment #25) > Peter, I have plenty of issues with the new patch, I fixed compilation > issues but tests are failing or crashing. I think it might be easier if you > point the difference between our patches so that I can continue. I started to compare with a diff, I already found some issues at both sides. I will continue to merge and then start to replace with multimap. Created attachment 3025 [details]
updated patch
I merged our contributions and fixed issues.
I will now look at adding multimap.
Peter, do you see something like this? typedef std::multimap<Time, NiChange> NiChanges; Where time is removed from the NiChange class. I indeed see less issues with iterator when using this multimap. But can we really get a direct function in multimap to replace the GetPrevious and GetNext functions as you mentioned? > But can we really get a direct function in multimap to replace
> the GetPrevious and GetNext functions as you mentioned?
Won't --it work?
InterferenceHelper::NiChanges::iterator
InterferenceHelper::GetPreviousPosition (Time moment)
{
auto it = m_niChanges.find (moment);
// Have to handle the case where moment is the first NiChange,
// so find returns begin()
return --it;
}
Created attachment 3026 [details]
latest solution with replacement of vector by multimap
Peter, is this fitting what you have in mind? Please let me know so that we can come up to a conclusion this week :-)
This generally looks good to me. I assume it passes all tests? A few nitpicks: interference-helper.cc:245: could this be a find(), instead of a loop? auto it = m_niChanges.find (event->GetStartTime ()); for ( ; it != m_niChanges.end (); ++it) if (it->second.GetEvent () == event) break; noiseInterference = it->second.GetPower (); :296 and :324: I'd prefer to have ++j at the top of the loop :362 and :740: ditto :798 and :804: What happens if the list is empty? Won't GetNext return end() == begin(), then --it gives you an error? at least when you dereference. :808: GetPositionBeforeTime() seems redundant with GetPreviousPosition(). It looks like there are three positions being used: GetPositionBeforeTime() event before a set of tied events, GetPreviousPosition() the last tied event GetNextPosition() the first event after a set of ties Could we use just two of these? Can we insert at the beginning of a tie set, instead of the end (to eliminate GetPreviousPosition)? interference-helper.h:153: Not obvious why this can't still be a const function. Comment it? :263: ditto Created attachment 3030 [details]
final patch
Peter, I addressed your remaining comments, I think it is now ready to get pushed.
interference-helper.h:323-329: remove obsolete GetNextPosition doxy and declaration interference-helper.cc:247: shouldn't this be outside the loop? interference-helper.cc:798: GetPreviousPosition() I don't understand this implementation. The basic implementation is obviously auto next = GetNextPosition (moment); auto prev = --next; // explicit variable for clarity of discussion return prev; as you have. This is complicated by the edge cases: begin != end (NiChanges has entries) next != begin: simple case: prev will be valid. This includes next == end. next == begin: prev won't be valid. Not clear what to do in this case. begin == end (list is empty) next == begin == end: prev won't be valid. Not clear what to do in this case. One resolution to the invalid cases could be: Insert a new zero power event prior to begin, but with no associated Event. At what time? The moment query argument? Looking at where GetPreviousPosition is used, perhaps the lack of Event isn't an issue in practice: line 186 and following: references only NiChange.m_power line 209: references only NiChange.m_power (In reply to Peter Barnes from comment #33) > interference-helper.h:323-329: remove obsolete GetNextPosition doxy and > declaration Oups, thanks :-) > > interference-helper.cc:247: shouldn't this be outside the loop? Correct > > interference-helper.cc:798: GetPreviousPosition() > I don't understand this implementation. The basic implementation is > obviously > > auto next = GetNextPosition (moment); > auto prev = --next; // explicit variable for clarity of discussion > return prev; > > as you have. This is complicated by the edge cases: > > begin != end (NiChanges has entries) > next != begin: simple case: prev will be valid. This includes next == > end. > next == begin: prev won't be valid. Not clear what to do in this case. > > > begin == end (list is empty) > next == begin == end: prev won't be valid. Not clear what to do in > this case. > > One resolution to the invalid cases could be: > Insert a new zero power event prior to begin, but with no associated > Event. At what time? The moment query argument? > > Looking at where GetPreviousPosition is used, perhaps the lack of Event > isn't an issue in practice: I rolled back to the previous implementation. We actually cannot encounter issues because we do some checks at other places to dereference only if size if not 0. > > line 186 and following: references only NiChange.m_power > line 209: references only NiChange.m_power Not sure I understand what you mean. >> One resolution to the invalid cases could be:
>> Insert a new zero power event prior to begin, but with no associated
>> Event. At what time? The moment query argument?
>>
>> Looking at where GetPreviousPosition is used, perhaps the lack of Event
>> isn't an issue in practice:
>
> I rolled back to the previous implementation.
> We actually cannot encounter issues because we do some checks at other
> places to dereference only if size if not 0.
>
>> line 186 and following: references only NiChange.m_power
>> line 209: references only NiChange.m_power
>
> Not sure I understand what you mean.
Only that inserting a zero power NiChange, with no value for Event, would be ok since no callers actually reference the previous event, only previous power.
> I rolled back to the previous implementation.
> We actually cannot encounter issues because we do some checks at other
> places to dereference only if size if not 0.
Um, I don't think so:
183 InterferenceHelper::GetEnergyDuration (double energyW) const
184 {
185 Time now = Simulator::Now ();
186 auto i = GetPreviousPosition (now);
187 Time end = i->first;
If NiChanges is empty, GetPrev returns --begin, which is invalid. This is a public function, so it can be called at arbitrary times, even when the list is empty. (For example, an upper layer could use this (or a similar function) to query the noise level, in order to decide if it's low enough to bother transmitting.)
Remember this patch isn't just for Wifi, but also Spectrum, so it's worth getting a general solution here.
Give a thought to adding an NiChange as suggested; it would be clean and cover all the cases.
(In reply to Peter Barnes from comment #36) > > I rolled back to the previous implementation. > > We actually cannot encounter issues because we do some checks at other > > places to dereference only if size if not 0. > > Um, I don't think so: > > 183 InterferenceHelper::GetEnergyDuration (double energyW) const > 184 { > 185 Time now = Simulator::Now (); > 186 auto i = GetPreviousPosition (now); > 187 Time end = i->first; > > If NiChanges is empty, GetPrev returns --begin, which is invalid. This is a > public function, so it can be called at arbitrary times, even when the list > is empty. (For example, an upper layer could use this (or a similar > function) to query the noise level, in order to decide if it's low enough to > bother transmitting.) > > Remember this patch isn't just for Wifi, but also Spectrum, so it's worth > getting a general solution here. > > Give a thought to adding an NiChange as suggested; it would be clean and > cover all the cases. You're correct :-) But I am not in favor of adding a dummy NiChange, I think this would be cleaner to have a check in GetEnergyDuration: if empty or if next is equal to begin, duration returned should simply be 0. The other places using GetPrevious are safe since they include a check on the size.
> > interference-helper.cc:247: shouldn't this be outside the loop?
>
> Correct
>
I am correcting myself. Actually this was on purpose, otherwise we have to make this more complex by doing --it -> read -> ++it.
> I am correcting myself. Actually this was on purpose, otherwise we have to > make this more complex by doing --it -> read -> ++it. Ok. The suggestion in Comment 32 seems more natural to me... > But I am not in favor of adding a dummy NiChange, I think this would be > cleaner Adding a dummy NiChange means the function always does what it says, which is pretty clean, rather than have to do fix-ups elsewhere in the cases when GetPrev doesn't do what it says. > to have a check in GetEnergyDuration: > if empty or if next is equal to begin, duration returned should simply be > 0. The other places using GetPrevious are safe since they include a check > on the size. That other place would also be simpler, just double previousPowerStart = GetPreviousPosition (event->GetStartTime ())->second.GetPower (); double previousPowerEnd = GetPreviousPosition (event->GetEndTime ())->second.GetPower (); I'll leave it up to you, just my $0.02 (In reply to Peter Barnes from comment #40) > > But I am not in favor of adding a dummy NiChange, I think this would be > > cleaner > > Adding a dummy NiChange means the function always does what it says, which > is pretty clean, rather than have to do fix-ups elsewhere in the cases when > GetPrev doesn't do what it says. > > > to have a check in GetEnergyDuration: > > if empty or if next is equal to begin, duration returned should simply be > > 0. The other places using GetPrevious are safe since they include a check > > on the size. > > That other place would also be simpler, just > > double previousPowerStart = GetPreviousPosition (event->GetStartTime > ())->second.GetPower (); > double previousPowerEnd = GetPreviousPosition (event->GetEndTime > ())->second.GetPower (); > > I'll leave it up to you, just my $0.02 Peter, I agree the function should do what it tells. If it would be possible to add the dummy NiChange just at the begin of the function GetPrevious and not in other part of the code, then it could work I think and it will be the cleanest solution. I'll give a try on that. (In reply to sebastien.deronne from comment #41) > (In reply to Peter Barnes from comment #40) > > > But I am not in favor of adding a dummy NiChange, I think this would be > > > cleaner > > > > Adding a dummy NiChange means the function always does what it says, which > > is pretty clean, rather than have to do fix-ups elsewhere in the cases when > > GetPrev doesn't do what it says. > > > > > to have a check in GetEnergyDuration: > > > if empty or if next is equal to begin, duration returned should simply be > > > 0. The other places using GetPrevious are safe since they include a check > > > on the size. > > > > That other place would also be simpler, just > > > > double previousPowerStart = GetPreviousPosition (event->GetStartTime > > ())->second.GetPower (); > > double previousPowerEnd = GetPreviousPosition (event->GetEndTime > > ())->second.GetPower (); > > > > I'll leave it up to you, just my $0.02 > > Peter, I agree the function should do what it tells. > If it would be possible to add the dummy NiChange just at the begin of the > function GetPrevious and not in other part of the code, then it could work I > think and it will be the cleanest solution. I'll give a try on that. Not that obvious because I want to keep the constness... Peter, can you provide me the lines of code corresponding to your solution? I unfortunately miss time and need to focus on higher priority items before the upcoming release. Created attachment 3035 [details]
Revised patch, keeps the first zero power noise event
This patch includes Tom's assert. Tom's example passes.
(In reply to Peter Barnes from comment #43) > Created attachment 3035 [details] > Revised patch, keeps the first zero power noise event > > This patch includes Tom's assert. Tom's example passes. Peter, thanks a lot for having taken this up :-) I had removed the assert from Tom because we will never again have negative values, so this check is not needed in my opinion. I wrote a quick benchmark based on test-interference-helper to force feed lots of packets at small interarrival times. On a static optimized build, with input of 2000 frames per second (50us interarrival), and disabling the WifiPhyStateHelper fatal error that prevent new packet from transmitting while in state TX, I'm seeing about 4% slower runtime with this new patch. I didn't profile or try to optimize, and if you think it is reasonable performance, feel free to commit. If you are expecting better performance than this, let me know and we can probably work out a better benchmarking program. The old method adds two events to the list, but defers computing total Ix power until needed. This version maintains in the list the total Ix power in every interval, so it's doing more work. 4% on a micro-benchmark doesn't seem too bad. A quick look with a profiler to see there isn't any easy fixes would be nice, but not necessary IMO. @Tom: can you make your benchmark a testcase? (In reply to Peter Barnes from comment #46) > The old method adds two events to the list, but defers computing total Ix > power until needed. This version maintains in the list the total Ix power > in every interval, so it's doing more work. 4% on a micro-benchmark doesn't > seem too bad. > > A quick look with a profiler to see there isn't any easy fixes would be > nice, but not necessary IMO. > > @Tom: can you make your benchmark a testcase? Posted as an example for scratch (not as a testcase). We could consider to put something like this in as a performance test at some point. Created attachment 3057 [details]
benchmark program
Pushed in changeset 13311:a40bedf7801a What do we have to do to apply this to Spectrum? |