Sign in to follow this  
arbitus

Multi-Linear search problem in near constant time

Recommended Posts

So a coworker posed a problem to me yesterday that intrigued me. He claimed he heard it as an interview question at Google. There exists a series such that f(n) = n. f(n) is the sum total of all occurrences of the digit 1 within all numbers in the range 0 to n. The problem then goes on to list several elements in the sequence and your job was to find the next element in the sequence. I was unconcerned with the answer. I was much more interested in the function f(n), and how it could be approached mathematically instead of as a brute force search. A couple of the guys who had been working on the problem throughout the day showed me their work, and they all had some variation of the brute force algorithm. Mine was the following:
static void Main(string[] args)
{
    int total = 0;
    int test = 10000000;

    foreach (var num in Enumerable.Range(0, test + 1))
    {
        total += num.ToString().ToCharArray().Count(x => x == '1');
    }

    Console.WriteLine("Answer = " + total);

    Console.ReadKey();
}

With a slight modification, it could answer the question without breaking a sweat. If I wanted to wait all day. So I decided to approach the problem from several angles. I tried a recursive algorithm that broke the number into chunks. I tried searching my brain for applicable probability or statistical formulas. Well, to tell the truth, because my day to day work centers on trying to keep management from shooting themselves in the foot rather than solving complex problems (*sigh*), my math is becoming a bit rusty. Instead, I approached the problem as a "visual" mathematical problem, and attacked it one digit at a time. The results are as follows:
static int calcResult(int n)
{
    int answer = 0;

    // Iterate over each position in the number
    for (int i = 0; i < (int)Math.Log10(n) + 1; i++)
    {
        answer += calcPosition(n, i);
    }

    return answer;
}

/// <summary>
/// Calculates the number of times a position within a number n has contained 
/// the value '1' in the range of numbers from 0 to n.
/// </summary>
/// <param name="n">Number</param>
/// <param name="pivotOffset">Digit offset from beginning of number</param>
/// <returns>The total times the position equals 1 in the range from 0 to n</returns>
static int calcPosition(int n, int pivotOffset)
{
    // The algorithm works by breaking the number into 3 components:
    // 1) The integer value of the position under consideration (the pivot)
    // 2) The integer value of the digits preceding the pivot
    // 3) The integer value of the digits remaining beyond the pivot
    int guess = 0;      

    // Helpers
    int maxExponent = (int)Math.Log10(n);                          
    int pivotPosition = maxExponent - pivotOffset;
    int pivotExponentValue = (int)Math.Pow(10, pivotPosition);
    int valueOfDigitsUpToPivot = n / pivotExponentValue;

    // Calc #1
    int valueOfPivot = valueOfDigitsUpToPivot % 10;
    // Calc #2
    int valueOfPreviousDigits = valueOfDigitsUpToPivot / 10;
    // Calc #3
    int valueOfRemainder = n - (valueOfDigitsUpToPivot * pivotExponentValue);

    // The digits previous to the pivot tell us how many sets of numbers
    // in which the pivot position has been equal to '1.' Each 'set' is equal
    // to the base 10 exponent value of the position of the pivot.
    guess += valueOfPreviousDigits * pivotExponentValue;

    // If the pivot is greater than 1, an addition 'set' has elapsed.
    if (valueOfPivot > 1)
        guess += pivotExponentValue;

    // If the pivot is '1,' it has not incremented through an entire 'set'
    // so take the current tally of the times it has incremented from 0
    // to the remainder.
    if (valueOfPivot == 1)
        guess += valueOfRemainder + 1;

    return guess;
}

Yes, I know I could clean this up, cover the case of 0, and generalize it to any digit. Right now, it is the principle that matters. Two things I like about this: 1) Complexity has been cut to Log10(n) 2) Each iteration is independent of the previous, so the iterations could be conducted in parallel instead, with a simple accumulation of the results. Now, my challenge is this: I am sure that there is a better algorithm. I want to know what the best mathematical algorithm to solve this problem is. I would like opinions on my approach, and hopefully I will learn something.

Share this post


Link to post
Share on other sites
Quote:
Original post by arbitus

There exists a series such that f(n) = n. f(n) is the sum total of all occurrences of the digit 1 within all numbers in the range 0 to n.



Just so I'm clear on what is actually asked.
- We have numbers 0..n (0, 1, 2, 3, 4, 5, 6, ....)
- f(n) = number of times digit one appears in that sequence

Because then I find it hard to spend even a second of time over writing:
            int total = 0;
int test = 10000000;

for (int i = 0; i < test+1; i++)
{
int d = i;
while (d > 0) {
if (d % 10 == 1) total++;
d /= 10;
}
}
It completes in a fraction of a second.


But it doesn't answer f(n) == n.

This one however:
            int total = 0;
int count = 0;

while (true)
{
int d = count;
while (d > 0) {
if (d % 10 == 1) total++;
d /= 10;
}
if (total == count) Console.WriteLine(total);
count++;
}
Does.

Is this the sequence that is meant? A series in which each element is that for which f(n) == n? And the task is to find next element?

Quote:
The problem then goes on to list several elements in the sequence and your job was to find the next element in the sequence.

Care to provide them for clarification?

Share this post


Link to post
Share on other sites
Quote:
Original post by Antheus
Because then I find it hard to spend even a second of time over writing:
* snip *
It completes in a fraction of a second.


That's the brute force approach. I wanted, as an exercise, to see if I could conceptualize something better than the brute force approach. This version is certainly faster than the brute approach I originally took, but it still has the same complexity.

Quote:

But it doesn't answer f(n) == n.


Right, I originally stated that I do not care about this part of the question; I only cared about looking at an alternative algorithm for computing f(n).

Quote:

Care to provide them for clarification?


I don't have the numbers on hand because I have no interest in that particular part, but if you need them I can provide them.

Share this post


Link to post
Share on other sites
A byte can be in 256 different configurations, for those 256 combinations we will have as many zeroes as we have ones, so 256 * 8 / 2 = 1024 ones. The interesting thing to note here is that this also happens to be the number of 1s which will appear between 0 - 9999 9999.

EDIT: I might also just be totally off as I just took a glance and had a hunch :P

Share this post


Link to post
Share on other sites
So, along the same lines I think this is a solution if I've understood the problem correctly. ( NumberOfDigitsInN^2 * NumberOfDigitsInN ) / 2 = NumberOfOnesInZeroToN.

EDIT: nvm, logic fail, going to verify next time :P

Share this post


Link to post
Share on other sites
Quote:
Original post by Ylisaren
So, along the same lines I think this is a solution if I've understood the problem correctly. ( NumberOfDigitsInN^2 * NumberOfDigitsInN ) / 2 = NumberOfOnesInZeroToN.


I am not exactly sure you have understood the problem correctly. Perhaps you can look at Antheus's brute force algorithm to get a quick idea.

For example if n is 1, the answer is 1, as the digit 1 appears once from 0-1.
If n is 10, the answer is 2, as 1 appears in the number 1 and the number 10 from 0-10.
If n is 11, the answer is 4, as 1 appears in the number 1, number 10, and twice in 11 from 0-11.
If n is 12, the answer is 5
If n is 20, the answer is 12
etc.

Share this post


Link to post
Share on other sites
Yeah, I've understood it correctly, my solution was wrong however as I was thinking in base 2. But you can use the same logic for base 10. For example, if you take the range 0 - 9999 there's 10 000 configurations. Out of those different configurations you will have 4 digits, and the chance of one of them being 1 is 1 / 10. So to solve it you then have ( pow( 10, log10( n ) ) * log10( n ) ) / 10. This will report the correct result for f( 9999 ), but naturally won't for all n.

Share this post


Link to post
Share on other sites
Quote:
Original post by Ylisaren
Yeah, I've understood it correctly, my solution was wrong however as I was thinking in base 2. But you can use the same logic for base 10. For example, if you take the range 0 - 9999 there's 10 000 configurations. Out of those different configurations you will have 4 digits, and the chance of one of them being 1 is 1 / 10. So to solve it you then have ( pow( 10, log10( n ) ) * log10( n ) ) / 10. This will report the correct result for f( 9999 ), but naturally won't for all n.


Yeah, we found a generalized equation for all powers of 10. I was trying to use this in a "divide and conquer" recursive strategy to apply it to any 'n' but it became monstrously complex and I gave up on that approach. As I said, I am quite confident that there is a mathematical approach given the probabilistic nature of the problem. Unfortunately, I have been unable to come up with it myself (hence the completely different strategy employed by my algorithm), but it would be interesting to also exam the difference in complexities and potential optimizations such as parallel execution, etc.

Edit: I, by no means, believe that I have come up with anything more than just an interesting approach that is probably not unique in the slightest. I am simply proud that I worked it out from scratch in a couple of hours and I am now interested in the "real" answer and how it compares from an algorithmic standpoint.

Share this post


Link to post
Share on other sites
Quote:
Original post by arbitus

This version is certainly faster than the brute approach I originally took, but it still has the same complexity.
Except that you haven't solved the brute force problem.

"Given number n, determine smallest m (m > n), so that f(m) = m."

Your version is exactly the same as mine. It is (m-n) * log10(m). Approximately.

The brute force part is still there - you optimized the wrong problem, and even made it worse, since it has higher constant factor.


The trick here is to determine how to improve iteration, not how to calculate individual n, nobody cares about that.

The brute force version is approximately n log n. The improved calculation doesn't change that, it merely changes how log(n) part is calculated using different log(n) algorithm.

Share this post


Link to post
Share on other sites
Quote:
Original post by Antheus
Except that you haven't solved the brute force problem.


I am having a hard time approaching a response to your post without a bit of defensiveness or snarkyness, as it seems you are missing the point.

I have stated twice already that I am not interested in solving f(n) = n, or finding any or part of the sequence. That is completely irrelevant to what I am wanting to discuss in this thread, other than being context in the story for a problem that I am interested in attacking, which is a more efficient algorithm for the implementation of f(n) than the brute force approach. I have not, nor ever been interested in the context of this thread in optimizing the algorithm for finding the sequence where f(n) = n.

Quote:
The trick here is to determine how to improve iteration, not how to calculate individual n, nobody cares about that.


I do, in fact, that is the only thing I care about in the context of this thread. The reason is because it is trivial to do both brute force on the computation of f(n) and the computation of the series where f(n) = n. I decided I wanted to investigate further the computation of f(n).

Share this post


Link to post
Share on other sites
Consider an arbitrary 'n' like 53941 and the sequence 0..53941. There will be a '1' in the one's place 5395 times (ceil(53941 / 10) * 1), a '1' in the ten's place 5400 times (ceil(53941 / 100) * 10), a '1' in the hundred's place 5400 times (ceil(53941 / 1000) * 100), a '1' in the thousand's place 6000 times (ceil(53941 / 10000) * 1000), and a '1' in the ten thousand's place 10000 times (ceil(53941 / 100000) * 10000). This gives us f(n) = 5395 + 5400 + 5400 + 6000 + 10000 = 32195. I confirmed that this is the correct answer using the brute force approach given by the OP, however I could have also gotten lucky and picked a value for 'n' that didn't expose any edge cases. Other than that, it's O(log10(n)) and each step is pretty darn simple.

Share this post


Link to post
Share on other sites
Quote:
Original post by Zipster
Consider an arbitrary 'n' like 53941 and the sequence 0..53941. There will be a '1' in the one's place 5395 times (ceil(53941 / 10) * 1), a '1' in the ten's place 5400 times (ceil(53941 / 100) * 10), a '1' in the hundred's place 5400 times (ceil(53941 / 1000) * 100), a '1' in the thousand's place 6000 times (ceil(53941 / 10000) * 1000), and a '1' in the ten thousand's place 10000 times (ceil(53941 / 100000) * 10000). This gives us f(n) = 5395 + 5400 + 5400 + 6000 + 10000 = 32195. I confirmed that this is the correct answer using the brute force approach given by the OP, however I could have also gotten lucky and picked a value for 'n' that didn't expose any edge cases. Other than that, it's O(log10(n)) and each step is pretty darn simple.


This is almost identical to one of the approaches I was taking, but your algorithm only works if none of the interior or leading digits are less than or equal to 1. For instance, try it with a leading 1 instead of 5, or change the 9 to a 1 or 0. It was refining this approach that lead me to what I have in the OP. If fact, this is very close to what I am doing, only I handle these edge cases.

Share this post


Link to post
Share on other sites
your above approach works better if you take floor((num-1)/10) + 1. for the first answer. then do the same for the rest. floor((num-1)/100) + 1

Share this post


Link to post
Share on other sites
Quote:
Original post by ibebrett
your above approach works better if you take floor((num-1)/10) + 1. for the first answer. then do the same for the rest. floor((num-1)/100) + 1


I am not quite following you, can you elaborate?

Additionally, I understand that the problem is pretty trivial. But:
1) Deriving your own solution for a trivial problem can be fun
2) I want to know if there is a solution with less complexity/growth rate

Share this post


Link to post
Share on other sites
yeah it has log_10(n) complexity. just like the guy above said, there are how many 1's in the 1's place? floor((n-1)/10) + 1. what about in the 10's place floor((n-10)/100) + 1. etc...

so it pretty much has a very low complexity. you only need to divide up to log_10(n)..

Share this post


Link to post
Share on other sites
Quote:
Original post by ibebrett
yeah it has log_10(n) complexity. just like the guy above said, there are how many 1's in the 1's place? floor((n-1)/10) + 1. what about in the 10's place floor((n-10)/100) + 1. etc...

so it pretty much has a very low complexity. you only need to divide up to log_10(n)..


Your solution is still incorrect unless you are leaving something out of your post. It gives me the same solution as Zipster and is still incorrect in the case of an interior or leading 1 or 0.

Share this post


Link to post
Share on other sites

import math

def ones(n):
ret = 0
for i in range(int(math.log10(n)) + 1):
ret = ret + int(math.floor((n-1)/10)) + 1
n = n / 10

return ret




wait hold on , you might be right.

Share this post


Link to post
Share on other sites
Quote:
Original post by arbitus
This is almost identical to one of the approaches I was taking, but your algorithm only works if none of the interior or leading digits are less than or equal to 1. For instance, try it with a leading 1 instead of 5, or change the 9 to a 1 or 0. It was refining this approach that lead me to what I have in the OP. If fact, this is very close to what I am doing, only I handle these edge cases.

Ah, very true, the math does make certain assumptions. However as you mentioned you can easily special-case '0' and '1' -- I believe that instead of doing (ceil(x) / 10N) * 10N-1, you can do x - (int(x / 10N) * 10N) + '0'/'1', which is just a fancy way of truncating the '1' and everything to the left of it, adding 1 if the digit was a '1' and 0 if the digit was a '0', and using that result (so 13941 becomes 3942 when evaluating the leading digit and the math works out again; 1395 + 1400 + 1400 + 3941 + 2000 = 10137). Again, could be more edge cases I'm missing.

Share this post


Link to post
Share on other sites
sorry guys i got it wrong but i am working out the right answer as we speak. its a little more complicated but its not too bad.. im just braking the number up like say...

142 9 841

Share this post


Link to post
Share on other sites
Quote:
Original post by Zipster
Again, could be more edge cases I'm missing.


This is quickly beginning to look like my original solution (check the OP). It solves in Log10(n) time and is pretty much the same math, just broken up a bit. I might be doing an unnecessary piece of algebra, as I have not combed over it for redundant operations to preserve the steps I took to arrive at it.

Edit: I was hoping to find a way to mathematically eliminate the logical statements (if statements) and reduce it to a formulaic form. Perhaps it isn't possible?

Share this post


Link to post
Share on other sites

import math

def digit(n, k):
return n/(10**(k)) % 10

def right(n, k):
if k == 0:
return 0
return n % 10**(k)

def left(n, k):
return n / 10**(k+1)

def ones(n):
ret = 0
for i in range(0, int(math.log10(n)+1)):
pow10 = 10**i
d = digit(n, i)
r = right(n, i)
l = left(n, i)

ret = ret + l*(pow10)

if d == 1: ret = ret + r + 1
if d >= 2: ret = ret + pow10

return ret




Ok, i am pretty sure this is right. In addition it does run in log10 time, so its still fast, but your right there are a few cases that ake this ugly. it may be possible to do better, but remember the number of 1's up to a number is not a natural mathematical concept, its a side effect of our way of concieving them, so don't wory about it being too pretty. thanks for the cool problem ... if this is wrong, let me know.

EDIT: Fixed mistake

[Edited by - ibebrett on March 31, 2010 9:59:17 PM]

Share this post


Link to post
Share on other sites
Can't validate it right now, but check my algorithm in the OP and see if it matches yours. At first glance it looks very close.

Edit: Not the first brute force one, but the second one posted.

Share this post


Link to post
Share on other sites
by the way

http://www2.research.att.com/~njas/sequences/?q=1,1,1,1,1,1,1,1,1,2,4,5,6,7,8,9&sort=0&fmt=0&language=english&go=Search

check this database out
we can validate against it.

and i fixed my solution above, i am 99% its right now (which i was before of course..)

Share this post


Link to post
Share on other sites
Quote:
Original post by ibebrett
by the way

http://www2.research.att.com/~njas/sequences/?q=1,1,1,1,1,1,1,1,1,2,4,5,6,7,8,9&sort=0&fmt=0&language=english&go=Search

check this database out
we can validate against it.

and i fixed my solution above, i am 99% its right now (which i was before of course..)


Yes, your solution looks identical to my solution (2nd code block in the op). I think it is probably the best one. The reason I was thinking there might be a mathematical solution is because this problem is based on probability. However, it does seem that the probability is spatial (relates to digit position) and cannot be solved as a set. Ah well, it was indeed fun to solve.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this