Periodic Sequence Factoring

Started by
4 comments, last by random_thinker 16 years, 6 months ago
I have a sequence of characters. The characters are from an infinite alphabet with the only available operations being equality comparison and hashing. I know that the sequence of characters is periodic. That is, it is of the form:
Seq = ABkC

A = a1 ... an
B = b1 ... bm
C = c1 ... cl
Given the sequence, how can I compute k, l, m and n quickly? (The characters are assembly instructions, and I need to factor them out using a for loop).
Advertisement
This certainly stretches my programming capability, but in an attempt to help I've found some good ideas here, if you haven't already seen it.

What you are saying, if I am not mistaken, is that these are T-periodic, of the form:

f(x+T) = f(x)

or in your case:
f(a+n) = f(a)f(b+m) = f(b)f(c+l) = f(c)

such that
Seq = f(a)*f(b)k*f(c)


and you are solving for n, m and l. In the absence of any generating function, the sequence for each would have to be subjected to 'pattern matching' or 'property matching', no? Are the character values represented by integers? It may be possible to optimally generate values for n, m and l (intelligently guess them) until the above equations are satisfied. Worst case, assuming they are indeed T-periodic, you could just start with n=2, for example, and iterate your sequence until f(a+n)=f(a). Another possibility is to take f(a)0 and search for the first re-occurence of it, for which there are excellent search algorithms available, and the distance becomes n (I have used this with pseudorandom number generators to verify/test claims about the period).

--random

[Edited by - random_thinker on October 2, 2007 4:25:16 AM]
--random_thinkerAs Albert Einstein said: 'Imagination is more important than knowledge'. Of course, he also said: 'If I had only known, I would have been a locksmith'.
Quote:Original post by ToohrVyk
I have a sequence of characters. The characters are from an infinite alphabet with the only available operations being equality comparison and hashing.

I know that the sequence of characters is periodic. That is, it is of the form:

Seq = ABkCA = a1 ... anB = b1 ... bmC = c1 ... cl


Given the sequence, how can I compute k, l, m and n quickly?

(The characters are assembly instructions, and I need to factor them out using a for loop).


I think all you need here is the ability to detect cycles. Googling around a little for "cycle detection algorithm" yielded this which looks promising. Knuth's Art of Computer Programming has a cycle detection algorithm in it, which IIRC is similar if not identical to the one described at that URL.

My free book on Direct3D: "The Direct3D Graphics Pipeline"
My blog on programming, vintage computing, music, politics, etc.: Legalize Adulthood!

Whether or not these algorithms are useful depends upon the state of the data. For example, if you have a large quantity of stored sequence data then a specialized algorithm is certainly advantageous. But if you have yet generated the data, a just as quick method must be to monitor the values as they are generated and capture when they repeat.

For integers this is a straight-forward process but for float or double values you would have to apply some sort of signal processing (smoothing) algorithm.

--random
--random_thinkerAs Albert Einstein said: 'Imagination is more important than knowledge'. Of course, he also said: 'If I had only known, I would have been a locksmith'.
Was an optimum algorithm found?

--random
--random_thinkerAs Albert Einstein said: 'Imagination is more important than knowledge'. Of course, he also said: 'If I had only known, I would have been a locksmith'.
I am interested in this thread because my own pseudo-random period detection code is providing some interesting results. I suppose that the interesting thing about a prng algorithm is that it is a periodic function by design but there should be no detectable mathematical pattern within cycles, if that makes sense.

My own detection code follows values as they are generated and locates the first to re-occur and then identifies the first three consecutive reoccurences as the beginning of the next cycle. Whether this is strictly correct I have not been able to verify. Interestingly, for std::rand() the results of ten separately seeded runs results in the following:

// One set of 10 independently seeded runs, each through one cycle...Test period of pseudorandom generator...Warning:  First duplicate value occurs at n = 1038947364, continuing...Warning:  First duplicate value occurs at n = 160672077, continuing...ID            : C++ Standard Library rand()Min Period    : 96387719Max Period    : 1732857588Ave Period    : 6.91864e+08No of cycles  : 10Elapsed mins  : 2.04767// Another set of 10 independently seeded runs, each through one cycle...Test period of pseudorandom generator...Warning:  First duplicate value occurs at n = 597879981, continuing...Warning:  First duplicate value occurs at n = 1493682969, continuing...Warning:  First duplicate value occurs at n = 133528437, continuing...Warning:  First duplicate value occurs at n = 1195410326, continuing...Warning:  First duplicate value occurs at n = 1192369109, continuing...ID            : C++ Standard Library rand()Min Period    : 115934013Max Period    : 1493682971Ave Period    : 6.46274e+08No of cycles  : 10Elapsed mins  : 1.91167


Either I am doing something wrong or this is another reason to beware using std::rand(). The rarely available literature on it indicates that it should have a period of 232-1. The actual period is all over the place, and there is a high probability that numbers will be repeated within a cycle.

--random
--random_thinkerAs Albert Einstein said: 'Imagination is more important than knowledge'. Of course, he also said: 'If I had only known, I would have been a locksmith'.

This topic is closed to new replies.

Advertisement