Jump to content
  • Advertisement
Sign in to follow this  
vaneger

How to 'guess' best ?

This topic is 4104 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

OK so if I have a set of data, lets say integers ranging from 0-255. The set can be any size but probably not larger than 512 values.Also there will only be an even number of values. The set will be reduced by half, by averaging 2 adjacent numbers and only saving the averaged value. Given the averaged set data, whats the best method for having an AI figure out the original data? Before I get flamed : this is not homework, just something I'm trying to figure out.

Share this post


Link to post
Share on other sites
Advertisement
You can't if I am understanding correctly

What is the average of 4 and 6?
What is the average of 2 and 8?

Answer these and you will have your answer :)

Share this post


Link to post
Share on other sites
There is no way to solve this problem with any form of accurate guess. Any answer you come up with is equally likely given the information. Why? Because you have N initial numbers but only N-1 constraint equations. The system is under-determined, meaning there are infinitely many solutions and hence any answer you guess is as good as any other, so your guess is worthless.

Share this post


Link to post
Share on other sites
If the purpose of this is simply to take a whole bunch of integers and squish them down in size, and performance doesn't mean anything, and you really really really want to save the extra hundred bytes (don't see why)...

Then you could take those averaged values and have a seperate vector with values representing the absolute value of the difference between one of the averaged values and the average itself. Basically, for every unsigned int pair:

unsigned int src: 2, 6
unsigned int avg: 4
...unsigned char difference: 2

| 6 - 2 | == | 2 - 4 | == 2. That way you're saving a weeeee bit of storage space (could be more if you have a couple of hundred thousand ints, however...)

Then again, if everything is a value between 0 and 255, everything fits in a byte anyways which makes my entire explanation of my idea before entirely, uh, nullified. Err. Useless.

What's the purpose for this anyways?

Share this post


Link to post
Share on other sites
When you generate the set, make it so theyre randomly generated with normal distribution (i.e. the middle values have a much higher chance of appearing) and thjen you can make it so your a.i. finds x solutions in main distribution, x-1 in 1 standard deviation over, x-2 for 2 standard deviations, etc.

Then you'll have a reasonable "guess" on the solution.

Share this post


Link to post
Share on other sites
Well I plan on using CRC hashes or some other error checking codes to use to compare guess values to , so in theory only the correct guess would have the same hash as the original data.

Share this post


Link to post
Share on other sites
What is the purpose of this, is it to compress data or is there another reason?

Share this post


Link to post
Share on other sites
The idea is to compress data yes, though uncompressing the data might take a rather long time ( which is what Im trying to figure out a better way of doing)

Share this post


Link to post
Share on other sites
Quote:
Original post by vaneger
Well I plan on using CRC hashes or some other error checking codes to use to compare guess values to , so in theory only the correct guess would have the same hash as the original data.

Hashes don't work that way - there will *always* be collisions between hashes (ie. two or more pieces of input data which yeild the same hash). A good hashing function can minimise collisions, particularly from similar data, but they can't eliminate it entirely.

Share this post


Link to post
Share on other sites
Unfortunately, the checksum/hash idea doesn't hold water. For a message 2N bytes in length, you'll have N bytes after you've averaged the data and a checksum of say 128 bits (8 bytes). You'll therefore have 2^(8*N) possible original messages and only 2^128 bits of information in your checksum. I'm being a bit vague about this, but the point is that the amount of information you have matters. MD5 checksum for instance is believed to require an average of 2^128 plaintexts to be tested to try and produce a given hash. Unfortunately you're going to have to try many, many more plaintexts than this because for any decent kind of hash/checksum it isn't possible to isolate the effects of changes to the plaintext.

Compression generally works in one of two ways: first, finding that some symbols (bytes, integers, whatever) occur more frequently than others and then encoding them using fewer bits (look up Huffman encoding) - however the price you pay is that you have to store the code as well. The other method is the ZIP method where strings of symbols are encoded as a single symbol if they occur more than once in your data. ZIP actually generates the symbol table on the fly and uses Huffman coding too, but that's the basic idea.

Your method is more useful to something like audio or video encoding. Some systems work by having a method to guess what the next sample in the stream is, and then encode the corrections (usually using Huffman encoding or similar). I guess is would be similarly possible to construct something like that for compressing English text since as I understand it English has an entropy of about 1.5 bits per character (so you should be able to get about 5:1 compression in theory).

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!