Exclusive maximum in random functions

Started by
19 comments, last by Hodgman 9 years, 2 months ago

Hi there! I'm working on some library functions and I've hit a problem. Why do people insist on having random functions (like Random(min, max)) where the max value is excluded?

Con's:

  • It's counter-intuitive. Most rookies trip on this at least once.
  • It's less efficient in the approach I was going for.
  • It has weird states like Random(0, 1) being the same as Random(0, 0) (unless one should forbid the latter, which seems only worse). You also can't Randomize through the full positive int range [0-Int32.MaxValue] (which kinda isn't a problem, but still inelegant).

Pro's:

  • Bit simpler when you're randomizing an element out of an array (array[Random(array.Size)]). Basically fits in better in a world where 0 is the starting index.
  • Everyone does it this way, including the System.Random functions (working in C#).

I really don't like a randomization lib being modelled only for indexing arrays, but the fact that everyone does it this way is a much more convincing argument. Before I go make a decision and move on, is there something I'm not thinking about?

Advertisement
There is some information on why the exclusive upper bound formulation is the most prevalent one here: http://stackoverflow.com/questions/9795391/is-there-a-standard-for-inclusive-exclusive-ends-of-time-intervals.

Not sure why it would be less efficient for you. As for the "weird states", Random(0, 1) would certainly not be equivalent to Random(0, 0). There are a few conventions, probably the most convenient one is to assert min <= max and if min == max then just return values from min to the maximum possible integer value (or just min < max and have a separate function for the full interval, but that is less compact).

In short, [a, b) intervals are a "natural" representation of ranges for various reasons mainly stemming from zero-based indexing, and (pseudo)random generation functions follow suit. What efficiency problems are you having with this notation exactly?

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

Thanks for the link! It's pretty much what I expected. I agree with that SO reply, but I'm not really sure if it applies to the interface of a random number generator. Expressing "I want a dice roll between 1-6" as Random(1, 6) just seems more natural to me.

Having Random(0, 0) mean Random(0, Int32.MaxValue-1) (or maybe Random(0, Int32.MaxValue)?) does not really feel natural either. If I don't go with a convention of my own, I'll just stick to the one in System.Random, where Random(0, 1) equals Random(0, 0).


Thanks for the link! It's pretty much what I expected. I agree with that SO reply, but I'm not really sure if it applies to the interface of a random number generator. Expressing "I want a dice roll between 1-6" as Random(1, 6) just seems more natural to me.

Ah, but the thing is that random number generators are not exclusively written to simulate dice rolls. When you are writing a random number generation library you usually want to go as generic as possible with your API in order to give the user the flexibility he needs to implement his own logic off of your random number generation routines. That can mean a generic Random(min, max) with the usual min/max convention which has effectively nothing to do with dice rolls. But you can implement dice rolls on top of that by writing a RandomDiceRoll(numSides) function that uses those Random functions, e.g. via Random(numSides) + 1. In and of itself, that Random() function has no relation to dice rolls, and indeed what you find "natural" for dice rolls will feel "unnatural" for things like selecting an array element at random, e.g. for shuffling a collection. That's because the function isn't designed for any specific task, and a min/max convention ultimately had to be picked.

In short, that the Random() function does not "favor" dice rolls is not a design flaw, it just means that you are meant to implement a dice rolling function using it instead of directly treating it as a dice-rolling function (because it isn't one).

And on that note, really, any random number generation library worth its salt should divide its API into a low-level random bit generation module (deterministic random bit generator, DRBG), a distribution API on top of that to generate different types of distributions like discrete [A, B) distributions, reals from 0 to 1, normally distributed numbers and so on, and finally perhaps (if applicable) a convenience API that performs some common tasks such as dice rolls, uniform choice from a set of values, shuffling, etc, etc. In that regard, I don't consider the System.Random class a particularly shining example of good design, but then most users are happy with a seed and a nextInt() function, so...

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

Thanks for the link! It's pretty much what I expected. I agree with that SO reply, but I'm not really sure if it applies to the interface of a random number generator. Expressing "I want a dice roll between 1-6" as Random(1, 6) just seems more natural to me.

If you have a very application-specific interpretation in mind, then wrap that in an application-specific function that rolls a six-sided die for you; DieRoll(6) would be even more natural than Random(1,6).

A general-purpose random number generator is an entirely different thing and has to consider all possible uses instead of just your application-specific demands.

But you can turn that argument around and ask, why would one design the random function's range only to index stuff in arrays smoothly when it's just as likely to be used for simulating die rolls? Stated differently, I *am* designing for general use, and I get the feel that exclusive maximum is a weirdness designed for indexing arrays, a very specific problem.

IIRC, the history of [a,b) convention in computer science can probably be traced back to math.


Stated differently, I *am* designing for general use, and I get the feel that exclusive maximum is a weirdness designed for indexing arrays, a very specific problem.

The physical 6-sided dice we have been using starts at 1, but it may just as well starts at 0 and ends at 5. What if I have a die that uses icons: Sunny, Cloudy, Rainy, Thunderstorm, Foggy, and Snow? I should be able to call Random(6), to indicate that I have 6 elements to choose from. Regardless of where I start from, 0 or 1 or 13 or 10417, I still must be able to map this result that:


startingPosition + Random(6)

Should give me the correct result.

Now, instead of an interval, what if I have an offset and length. Give me 10 results starting from result #102. My notation is therefore [102, 102+10), rather than [102, 102+10-1].

But you can turn that argument around and ask, why would one design the random function's range only to index stuff in arrays smoothly when it's just as likely to be used for simulating die rolls? Stated differently, I *am* designing for general use, and I get the feel that exclusive maximum is a weirdness designed for indexing arrays, a very specific problem.

If you are designing a random generator for die rolls, then go ahead and use an inclusive range. If you design a random generator for array access, then go ahead and use an exclusive range. But a general-purpose random generator is not specifically designed for any case but to generate random numbers, it is designed neither for die rolls nor array accesses.

As Bacterius stated earlier, or at least hinted at, when you apply specific use cases to random number generators you are mixing two things; randomness and distributions. A random number generator should preferably only generate numbers, while specific code is used to shape the random numbers into specific distributions. In this case, a die roll and array indices are two distributions.

The general way to think about a die rolls that separates randomness and distribution is: I want 6 different random numbers, and once I have them I map them to the possible outcomes of a die roll. If your random function returns numbers between 0 and 5, then your die roll map becomes f(x) = x+1 (that is, insert random number 0 to 5, and get a die roll from 1 to 6), while the array index becomes f(x) = x. Similarly, if your random generator returns a value between 1 and 6, the map becomes f(x) = x for the die roll and f(x) = x-1 for array access.

None of the mapping functions are inherently better than the other. However, when you look at many different use cases, the math of 0-based exclusive ranges falls out prettier without magic numbers more often than not. You rarely have to add or subtract 1 to compensate for exclusive ranges.

None of the mapping functions are inherently better than the other. However, when you look at many different use cases, the math of 0-based exclusive ranges falls out prettier without magic numbers more often than not. You rarely have to add or subtract 1 to compensate for exclusive ranges.

This pretty much nails it. I agree that neither way (exclusive or inclusive max) is inherently better than the other. So unless I'm missing something, it boils down to my arguments above (counter-intuitive and weird states) vs usefulness for indexing arrays and convention.

Thanks for taking your time to answer, I'll go pounder this a bit more smile.png

This topic is closed to new replies.

Advertisement