Public Group

# How to 'guess' best ?

This topic is 4104 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

OK so if I have a set of data, lets say integers ranging from 0-255. The set can be any size but probably not larger than 512 values.Also there will only be an even number of values. The set will be reduced by half, by averaging 2 adjacent numbers and only saving the averaged value. Given the averaged set data, whats the best method for having an AI figure out the original data? Before I get flamed : this is not homework, just something I'm trying to figure out.

##### Share on other sites
You can't if I am understanding correctly

What is the average of 4 and 6?
What is the average of 2 and 8?

##### Share on other sites
There is no way to solve this problem with any form of accurate guess. Any answer you come up with is equally likely given the information. Why? Because you have N initial numbers but only N-1 constraint equations. The system is under-determined, meaning there are infinitely many solutions and hence any answer you guess is as good as any other, so your guess is worthless.

##### Share on other sites
If the purpose of this is simply to take a whole bunch of integers and squish them down in size, and performance doesn't mean anything, and you really really really want to save the extra hundred bytes (don't see why)...

Then you could take those averaged values and have a seperate vector with values representing the absolute value of the difference between one of the averaged values and the average itself. Basically, for every unsigned int pair:

unsigned int src: 2, 6
unsigned int avg: 4
...unsigned char difference: 2

| 6 - 2 | == | 2 - 4 | == 2. That way you're saving a weeeee bit of storage space (could be more if you have a couple of hundred thousand ints, however...)

Then again, if everything is a value between 0 and 255, everything fits in a byte anyways which makes my entire explanation of my idea before entirely, uh, nullified. Err. Useless.

What's the purpose for this anyways?

##### Share on other sites
When you generate the set, make it so theyre randomly generated with normal distribution (i.e. the middle values have a much higher chance of appearing) and thjen you can make it so your a.i. finds x solutions in main distribution, x-1 in 1 standard deviation over, x-2 for 2 standard deviations, etc.

Then you'll have a reasonable "guess" on the solution.

##### Share on other sites
Well I plan on using CRC hashes or some other error checking codes to use to compare guess values to , so in theory only the correct guess would have the same hash as the original data.

##### Share on other sites
What is the purpose of this, is it to compress data or is there another reason?

##### Share on other sites
The idea is to compress data yes, though uncompressing the data might take a rather long time ( which is what Im trying to figure out a better way of doing)

##### Share on other sites
Quote:
 Original post by vanegerWell I plan on using CRC hashes or some other error checking codes to use to compare guess values to , so in theory only the correct guess would have the same hash as the original data.

Hashes don't work that way - there will *always* be collisions between hashes (ie. two or more pieces of input data which yeild the same hash). A good hashing function can minimise collisions, particularly from similar data, but they can't eliminate it entirely.

##### Share on other sites
Unfortunately, the checksum/hash idea doesn't hold water. For a message 2N bytes in length, you'll have N bytes after you've averaged the data and a checksum of say 128 bits (8 bytes). You'll therefore have 2^(8*N) possible original messages and only 2^128 bits of information in your checksum. I'm being a bit vague about this, but the point is that the amount of information you have matters. MD5 checksum for instance is believed to require an average of 2^128 plaintexts to be tested to try and produce a given hash. Unfortunately you're going to have to try many, many more plaintexts than this because for any decent kind of hash/checksum it isn't possible to isolate the effects of changes to the plaintext.

Compression generally works in one of two ways: first, finding that some symbols (bytes, integers, whatever) occur more frequently than others and then encoding them using fewer bits (look up Huffman encoding) - however the price you pay is that you have to store the code as well. The other method is the ZIP method where strings of symbols are encoded as a single symbol if they occur more than once in your data. ZIP actually generates the symbol table on the fly and uses Huffman coding too, but that's the basic idea.

Your method is more useful to something like audio or video encoding. Some systems work by having a method to guess what the next sample in the stream is, and then encode the corrections (usually using Huffman encoding or similar). I guess is would be similarly possible to construct something like that for compressing English text since as I understand it English has an entropy of about 1.5 bits per character (so you should be able to get about 5:1 compression in theory).

• 10
• 17
• 9
• 14
• 41