Bugzilla – Bug 181
Normal Variable Infinite Value & Bounds
Last modified: 2008-07-01 13:32:30 UTC
Two quick points: - NormalVariable::INFINITE_VALUE is no longer defined. NormalVariableImpl defines and uses a value, but NormalVariable declares and uses a static const INFINITE_VALUE to define some default values, without ever defining it. It should be defined, or probably more correctly, the two methods that use it should be changed such that there is no default but instead one version w/ no third parameter, and another w/ the bounds parameter, so that the underlying implementation can work with defaults however it wants. - I am not totally sure what this would do to the current algorithm, but it seems reasonable that the valid region should be mean +/- bound rather than -bound to bound. This makes it more convenient, for example, to create a normal variable w/ distribution 2, 0.1 bounded by 1--3, i.e. to jitter some periodic packet generation. Taking a glance, it seems like this should not affect the algorithm, but I am not an RNG expert. Thx
Created attachment 135 [details] Minor patch
Comment on attachment 135 [details] Minor patch This patch only addresses the INFINITE_VALUE issue. The bounds issue needs feedback as its more about expectations & usability, so it has been left alone here.
(In reply to comment #2) > (From update of attachment 135 [details] [edit]) > This patch only addresses the INFINITE_VALUE issue. The bounds issue needs > feedback as its more about expectations & usability, so it has been left alone > here. > The bounds setting API is probably non optimal. If anything, the range should extend [mean-bound,mean+bound], that way the distribution mean and mode are preserved. Asymmetric bounds causes the mean of the bounded distribution to be different from what you specified.
Created attachment 156 [details] Proposed (untested fix) Subsumes Joe's patch. Does symmetrical bounds, I'm just doing some quick histogram plots to verify.
Created attachment 157 [details] test case for above patch This is a test case which generates both unbounded and bounded versions of a Normal(10,2). Histogram plots verify that it works at bounding the distribution symmetrically about the mean. The tail of the bounded distribution is skewed however: http://www.prism.gatech.edu/~gtg037s/files/normal-variable.eps This is due to the way the code is forcing values outside of the range to be on the range boundary. Ideally, the code would drop values outside of the range and "try again" so that the boundary didn't have this behavrior. Honestly though, this API was never intended to do this. I conceived that the bound would be set somewhere like 5 standard deviations away, so that this tail effect is minimized. What do people think?
(In reply to comment #5) > Created an attachment (id=157) [edit] > test case for above patch > > This is a test case which generates both unbounded and bounded versions of a > Normal(10,2). Histogram plots verify that it works at bounding the > distribution symmetrically about the mean. The tail of the bounded > distribution is skewed however: > http://www.prism.gatech.edu/~gtg037s/files/normal-variable.eps > > This is due to the way the code is forcing values outside of the range to be on > the range boundary. Ideally, the code would drop values outside of the range > and "try again" so that the boundary didn't have this behavrior. > > Honestly though, this API was never intended to do this. I conceived that the > bound would be set somewhere like 5 standard deviations away, so that this tail > effect is minimized. What do people think? From the perspective of the user, I think that forcing values outside of the range to be on the edge of the range is actively harmful. Can't you just drop values outside of the range and retry (from within the random variable code) and return a value when it is within the range only ?
Created attachment 160 [details] updated to throw away values outside of range and "try again" This does what I described in my last comment. It throws away values outside of the range and keeps trying until it can return a valid number: http://www.prism.gatech.edu/~gtg037s/files/normal-variable.eps Same test case, same normal 10,2 distribution. I would push, but this needs some feedback from our new maintainer Michele Weigle.
(In reply to comment #7) > Created an attachment (id=160) [edit] > updated to throw away values outside of range and "try again" > > This does what I described in my last comment. It throws away values outside > of the range and keeps trying until it can return a valid number: > http://www.prism.gatech.edu/~gtg037s/files/normal-variable.eps > > Same test case, same normal 10,2 distribution. I would push, but this needs > some feedback from our new maintainer Michele Weigle. This looks like a good solution to the issue.
changeset: d0c70dbe918e
Hi all, The fix for the bounds that was applied is incorrect, or I am seriously confused. It currently, quite frequently, generates values outside the bounds. I am not sure if the original code had this behavior or not. The below snippet will bomb pretty much every run. I suspect that the readjustment of m_static_next if found out-of-bounds is incorrect, but have not investigated further. Thx bool bomb = 0; double mean = 1.0, variance = 0.2, bound = 0.75; for (int x = 0; x < 50; x++) { double x = NormalVariable::GetSingleValue(mean,variance,bound); if (x < mean-bound || x > mean+bound) { bomb = true; std::cout << "** "; } std::cout << x << std::endl; } if (bomb) std::cout << std::endl << "BOMBED" << std::endl;
(In reply to comment #10) > Hi all, > > The fix for the bounds that was applied is incorrect, or I am seriously > confused. It currently, quite frequently, generates values outside the bounds. > I am not sure if the original code had this behavior or not. The below > snippet will bomb pretty much every run. I suspect that the readjustment of > m_static_next if found out-of-bounds is incorrect, but have not investigated > further. > > Thx > > > bool bomb = 0; > double mean = 1.0, variance = 0.2, bound = 0.75; > for (int x = 0; x < 50; x++) { > double x = NormalVariable::GetSingleValue(mean,variance,bound); > if (x < mean-bound || x > mean+bound) { > bomb = true; > std::cout << "** "; > } > std::cout << x << std::endl; > } > if (bomb) > std::cout << std::endl << "BOMBED" << std::endl; The previous patch did modify GetValue but not GetSingleValue. >
Created attachment 176 [details] fixes the static generator This doesn't "BOMB" on Joe's test case, and uses the same logic that the non-static variants use to generate bounded distributions. Joe's distribution with 300000 trials: http://www.prism.gatech.edu/~gtg037s/files/joe-test-case.eps Joe please do one final review, and lets push this bad boy.
This looks good to me. For the record though, several of the other distributions do follow the previously discussed policy of simply taking the bound as the value if it is surpassed, rather than resampling (as NormalVariable now does). Exponential, Pareto, and Weibull all do that. But, I would at least include this patch. Thx Raj
Applied patch from 2008-06-19 16:14