Jump to content
  • Advertisement
Sign in to follow this  
Mantear

Random data set generation

This topic is 2992 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm looking for an efficient method of generating a large set of unique 64-bit integers (on the order of 10,000 values). I use a PRNG (a Mersenne Twister implementation to acheive a long period) to create input to an AES core with a given key. The AES output is used to populate the 64-bit data set. The actual generation of the data is fast enough. The problem (slowness) comes with guaranteeing uniqueness among all 10,000 64-bit numbers. Any ideas?

EDIT: Oops, meant to put this in the General Programming area.

Share this post


Link to post
Share on other sites
Advertisement
Generate a bunch of input numbers (more than you need).
sort the final list.
remove duplicates (they'll all be right next to one another)
and then if you need "randomness" preform a shuffle on the final data set.

Share this post


Link to post
Share on other sites
Perhaps you could use an std::set and keep generating/inserting values until its size is what you need?

std::set<unsigned long long> values;
const unsigned int N = 10000;

for(unsigned int i = 0; i < 2*N; ++i) {
values.insert(getRandomValue());
if(values.size() == N) {
break;
}
}




I've used a for-loop instead of a while-loop to prevent infinite loops, so you can decide what you want to if the loop ends and there aren't enough values.

Share this post


Link to post
Share on other sites
@Zipster: That's the path I started down with
std::set<unsigned long long> values;
while (values.size() < numValues)
{
values.insert(getRandom64Num());
}

.. until I realized std::set is sorted and std::random_shuffle can't work on a std::set. I don't think copying the contents out of the set into another container in a random fashion is an efficient/good idea.

I'm going to try what KulSeran said. Generate a std::list<> that's 10% larger than what I need. Then call std::list::sort(), std::list::unique(), and std::random_shuffle(). Handle special-case if the resulting list after calling unique() is smaller than what I need.

Share this post


Link to post
Share on other sites
Quote:
Original post by Zipster
*** Source Snippet Removed ***
You can save a little work by generating the first 10000, then doing the loop to add the rest.

Because you are filling up such a small amount of the possible space (2^13 numbers out of 2^64 space, or roughly 0.00000000000005%) there is a very low probability of duplicates if your PRNG is configured properly.

Share this post


Link to post
Share on other sites
Quote:
Original post by frob
Because you are filling up such a small amount of the possible space (2^13 numbers out of 2^64 space, or roughly 0.00000000000005%) there is a very low probability of duplicates if your PRNG is configured properly.


Very small, yes, but even a single occurange is not acceptable.

My std::list sort(), unique(), actually ended up working rather poorly, even without a randomization happening yet. The best method I have so far is using a std::set<>, inserting into the set the randomly generated numbers until the set.size() equals the data set size I need, std::copy() from the std::set to a (pre-allocated) std::vector, then doing a std::random_shuffle on the vector.

EDIT: edits made

Share this post


Link to post
Share on other sites
Hi,

You can use a maximal length linear feedback shift register (LFSR). It can generate 2^64-1 unique values before repeating itself, and it is very fast. You might also want to look up "gold sequence".

Share this post


Link to post
Share on other sites
Quote:
Original post by Mantear
The best method I have so far is using a std::set<>, inserting into the set the randomly generated numbers until the set.size() equals the data set size I need, std::copy() from the std::set to a (pre-allocated) std::vector, then doing a std::random_shuffle on the vector.


That seems like a reasonable plan. You probably don't need to do this very often, do you?

Share this post


Link to post
Share on other sites
You could just generate 10,000 random 64-bit numbers and not check for collisions. The probability of a collision is less than .000000000003, which means you'll never see one.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!