Multi-Linear search problem in near constant time

Started by
24 comments, last by alvaro 14 years ago
So a coworker posed a problem to me yesterday that intrigued me. He claimed he heard it as an interview question at Google. There exists a series such that f(n) = n. f(n) is the sum total of all occurrences of the digit 1 within all numbers in the range 0 to n. The problem then goes on to list several elements in the sequence and your job was to find the next element in the sequence. I was unconcerned with the answer. I was much more interested in the function f(n), and how it could be approached mathematically instead of as a brute force search. A couple of the guys who had been working on the problem throughout the day showed me their work, and they all had some variation of the brute force algorithm. Mine was the following:

static void Main(string[] args)
{
    int total = 0;
    int test = 10000000;

    foreach (var num in Enumerable.Range(0, test + 1))
    {
        total += num.ToString().ToCharArray().Count(x => x == '1');
    }

    Console.WriteLine("Answer = " + total);

    Console.ReadKey();
}

With a slight modification, it could answer the question without breaking a sweat. If I wanted to wait all day. So I decided to approach the problem from several angles. I tried a recursive algorithm that broke the number into chunks. I tried searching my brain for applicable probability or statistical formulas. Well, to tell the truth, because my day to day work centers on trying to keep management from shooting themselves in the foot rather than solving complex problems (*sigh*), my math is becoming a bit rusty. Instead, I approached the problem as a "visual" mathematical problem, and attacked it one digit at a time. The results are as follows:

static int calcResult(int n)
{
    int answer = 0;

    // Iterate over each position in the number
    for (int i = 0; i < (int)Math.Log10(n) + 1; i++)
    {
        answer += calcPosition(n, i);
    }

    return answer;
}

/// <summary>
/// Calculates the number of times a position within a number n has contained 
/// the value '1' in the range of numbers from 0 to n.
/// </summary>
/// <param name="n">Number</param>
/// <param name="pivotOffset">Digit offset from beginning of number</param>
/// <returns>The total times the position equals 1 in the range from 0 to n</returns>
static int calcPosition(int n, int pivotOffset)
{
    // The algorithm works by breaking the number into 3 components:
    // 1) The integer value of the position under consideration (the pivot)
    // 2) The integer value of the digits preceding the pivot
    // 3) The integer value of the digits remaining beyond the pivot
    int guess = 0;      

    // Helpers
    int maxExponent = (int)Math.Log10(n);                          
    int pivotPosition = maxExponent - pivotOffset;
    int pivotExponentValue = (int)Math.Pow(10, pivotPosition);
    int valueOfDigitsUpToPivot = n / pivotExponentValue;

    // Calc #1
    int valueOfPivot = valueOfDigitsUpToPivot % 10;
    // Calc #2
    int valueOfPreviousDigits = valueOfDigitsUpToPivot / 10;
    // Calc #3
    int valueOfRemainder = n - (valueOfDigitsUpToPivot * pivotExponentValue);

    // The digits previous to the pivot tell us how many sets of numbers
    // in which the pivot position has been equal to '1.' Each 'set' is equal
    // to the base 10 exponent value of the position of the pivot.
    guess += valueOfPreviousDigits * pivotExponentValue;

    // If the pivot is greater than 1, an addition 'set' has elapsed.
    if (valueOfPivot > 1)
        guess += pivotExponentValue;

    // If the pivot is '1,' it has not incremented through an entire 'set'
    // so take the current tally of the times it has incremented from 0
    // to the remainder.
    if (valueOfPivot == 1)
        guess += valueOfRemainder + 1;

    return guess;
}

Yes, I know I could clean this up, cover the case of 0, and generalize it to any digit. Right now, it is the principle that matters. Two things I like about this: 1) Complexity has been cut to Log10(n) 2) Each iteration is independent of the previous, so the iterations could be conducted in parallel instead, with a simple accumulation of the results. Now, my challenge is this: I am sure that there is a better algorithm. I want to know what the best mathematical algorithm to solve this problem is. I would like opinions on my approach, and hopefully I will learn something.
Advertisement
Quote:Original post by arbitus

There exists a series such that f(n) = n. f(n) is the sum total of all occurrences of the digit 1 within all numbers in the range 0 to n.



Just so I'm clear on what is actually asked.
- We have numbers 0..n (0, 1, 2, 3, 4, 5, 6, ....)
- f(n) = number of times digit one appears in that sequence

Because then I find it hard to spend even a second of time over writing:
            int total = 0;            int test = 10000000;            for (int i = 0; i < test+1; i++)            {                int d = i;                while (d > 0) {                    if (d % 10 == 1) total++;                    d /= 10;                }            }
It completes in a fraction of a second.


But it doesn't answer f(n) == n.

This one however:
            int total = 0;            int count = 0;            while (true)             {                int d = count;                while (d > 0) {                    if (d % 10 == 1) total++;                    d /= 10;                }                if (total == count) Console.WriteLine(total);                count++;            }
Does.

Is this the sequence that is meant? A series in which each element is that for which f(n) == n? And the task is to find next element?

Quote:The problem then goes on to list several elements in the sequence and your job was to find the next element in the sequence.

Care to provide them for clarification?
Quote:Original post by Antheus
Because then I find it hard to spend even a second of time over writing:
* snip *
It completes in a fraction of a second.


That's the brute force approach. I wanted, as an exercise, to see if I could conceptualize something better than the brute force approach. This version is certainly faster than the brute approach I originally took, but it still has the same complexity.

Quote:
But it doesn't answer f(n) == n.


Right, I originally stated that I do not care about this part of the question; I only cared about looking at an alternative algorithm for computing f(n).

Quote:
Care to provide them for clarification?


I don't have the numbers on hand because I have no interest in that particular part, but if you need them I can provide them.
A byte can be in 256 different configurations, for those 256 combinations we will have as many zeroes as we have ones, so 256 * 8 / 2 = 1024 ones. The interesting thing to note here is that this also happens to be the number of 1s which will appear between 0 - 9999 9999.

EDIT: I might also just be totally off as I just took a glance and had a hunch :P
So, along the same lines I think this is a solution if I've understood the problem correctly. ( NumberOfDigitsInN^2 * NumberOfDigitsInN ) / 2 = NumberOfOnesInZeroToN.

EDIT: nvm, logic fail, going to verify next time :P
Quote:Original post by Ylisaren
So, along the same lines I think this is a solution if I've understood the problem correctly. ( NumberOfDigitsInN^2 * NumberOfDigitsInN ) / 2 = NumberOfOnesInZeroToN.


I am not exactly sure you have understood the problem correctly. Perhaps you can look at Antheus's brute force algorithm to get a quick idea.

For example if n is 1, the answer is 1, as the digit 1 appears once from 0-1.
If n is 10, the answer is 2, as 1 appears in the number 1 and the number 10 from 0-10.
If n is 11, the answer is 4, as 1 appears in the number 1, number 10, and twice in 11 from 0-11.
If n is 12, the answer is 5
If n is 20, the answer is 12
etc.

Yeah, I've understood it correctly, my solution was wrong however as I was thinking in base 2. But you can use the same logic for base 10. For example, if you take the range 0 - 9999 there's 10 000 configurations. Out of those different configurations you will have 4 digits, and the chance of one of them being 1 is 1 / 10. So to solve it you then have ( pow( 10, log10( n ) ) * log10( n ) ) / 10. This will report the correct result for f( 9999 ), but naturally won't for all n.
Quote:Original post by Ylisaren
Yeah, I've understood it correctly, my solution was wrong however as I was thinking in base 2. But you can use the same logic for base 10. For example, if you take the range 0 - 9999 there's 10 000 configurations. Out of those different configurations you will have 4 digits, and the chance of one of them being 1 is 1 / 10. So to solve it you then have ( pow( 10, log10( n ) ) * log10( n ) ) / 10. This will report the correct result for f( 9999 ), but naturally won't for all n.


Yeah, we found a generalized equation for all powers of 10. I was trying to use this in a "divide and conquer" recursive strategy to apply it to any 'n' but it became monstrously complex and I gave up on that approach. As I said, I am quite confident that there is a mathematical approach given the probabilistic nature of the problem. Unfortunately, I have been unable to come up with it myself (hence the completely different strategy employed by my algorithm), but it would be interesting to also exam the difference in complexities and potential optimizations such as parallel execution, etc.

Edit: I, by no means, believe that I have come up with anything more than just an interesting approach that is probably not unique in the slightest. I am simply proud that I worked it out from scratch in a couple of hours and I am now interested in the "real" answer and how it compares from an algorithmic standpoint.
Quote:Original post by arbitus

This version is certainly faster than the brute approach I originally took, but it still has the same complexity.
Except that you haven't solved the brute force problem.

"Given number n, determine smallest m (m > n), so that f(m) = m."

Your version is exactly the same as mine. It is (m-n) * log10(m). Approximately.

The brute force part is still there - you optimized the wrong problem, and even made it worse, since it has higher constant factor.


The trick here is to determine how to improve iteration, not how to calculate individual n, nobody cares about that.

The brute force version is approximately n log n. The improved calculation doesn't change that, it merely changes how log(n) part is calculated using different log(n) algorithm.
Quote:Original post by Antheus
Except that you haven't solved the brute force problem.


I am having a hard time approaching a response to your post without a bit of defensiveness or snarkyness, as it seems you are missing the point.

I have stated twice already that I am not interested in solving f(n) = n, or finding any or part of the sequence. That is completely irrelevant to what I am wanting to discuss in this thread, other than being context in the story for a problem that I am interested in attacking, which is a more efficient algorithm for the implementation of f(n) than the brute force approach. I have not, nor ever been interested in the context of this thread in optimizing the algorithm for finding the sequence where f(n) = n.

Quote:The trick here is to determine how to improve iteration, not how to calculate individual n, nobody cares about that.


I do, in fact, that is the only thing I care about in the context of this thread. The reason is because it is trivial to do both brute force on the computation of f(n) and the computation of the series where f(n) = n. I decided I wanted to investigate further the computation of f(n).

This topic is closed to new replies.

Advertisement