arbitus 440 Report post Posted March 31, 2010 So a coworker posed a problem to me yesterday that intrigued me. He claimed he heard it as an interview question at Google. There exists a series such that f(n) = n. f(n) is the sum total of all occurrences of the digit 1 within all numbers in the range 0 to n. The problem then goes on to list several elements in the sequence and your job was to find the next element in the sequence. I was unconcerned with the answer. I was much more interested in the function f(n), and how it could be approached mathematically instead of as a brute force search. A couple of the guys who had been working on the problem throughout the day showed me their work, and they all had some variation of the brute force algorithm. Mine was the following: static void Main(string[] args) { int total = 0; int test = 10000000; foreach (var num in Enumerable.Range(0, test + 1)) { total += num.ToString().ToCharArray().Count(x => x == '1'); } Console.WriteLine("Answer = " + total); Console.ReadKey(); } With a slight modification, it could answer the question without breaking a sweat. If I wanted to wait all day. So I decided to approach the problem from several angles. I tried a recursive algorithm that broke the number into chunks. I tried searching my brain for applicable probability or statistical formulas. Well, to tell the truth, because my day to day work centers on trying to keep management from shooting themselves in the foot rather than solving complex problems (*sigh*), my math is becoming a bit rusty. Instead, I approached the problem as a "visual" mathematical problem, and attacked it one digit at a time. The results are as follows: static int calcResult(int n) { int answer = 0; // Iterate over each position in the number for (int i = 0; i < (int)Math.Log10(n) + 1; i++) { answer += calcPosition(n, i); } return answer; } /// <summary> /// Calculates the number of times a position within a number n has contained /// the value '1' in the range of numbers from 0 to n. /// </summary> /// <param name="n">Number</param> /// <param name="pivotOffset">Digit offset from beginning of number</param> /// <returns>The total times the position equals 1 in the range from 0 to n</returns> static int calcPosition(int n, int pivotOffset) { // The algorithm works by breaking the number into 3 components: // 1) The integer value of the position under consideration (the pivot) // 2) The integer value of the digits preceding the pivot // 3) The integer value of the digits remaining beyond the pivot int guess = 0; // Helpers int maxExponent = (int)Math.Log10(n); int pivotPosition = maxExponent - pivotOffset; int pivotExponentValue = (int)Math.Pow(10, pivotPosition); int valueOfDigitsUpToPivot = n / pivotExponentValue; // Calc #1 int valueOfPivot = valueOfDigitsUpToPivot % 10; // Calc #2 int valueOfPreviousDigits = valueOfDigitsUpToPivot / 10; // Calc #3 int valueOfRemainder = n - (valueOfDigitsUpToPivot * pivotExponentValue); // The digits previous to the pivot tell us how many sets of numbers // in which the pivot position has been equal to '1.' Each 'set' is equal // to the base 10 exponent value of the position of the pivot. guess += valueOfPreviousDigits * pivotExponentValue; // If the pivot is greater than 1, an addition 'set' has elapsed. if (valueOfPivot > 1) guess += pivotExponentValue; // If the pivot is '1,' it has not incremented through an entire 'set' // so take the current tally of the times it has incremented from 0 // to the remainder. if (valueOfPivot == 1) guess += valueOfRemainder + 1; return guess; } Yes, I know I could clean this up, cover the case of 0, and generalize it to any digit. Right now, it is the principle that matters. Two things I like about this: 1) Complexity has been cut to Log10(n) 2) Each iteration is independent of the previous, so the iterations could be conducted in parallel instead, with a simple accumulation of the results. Now, my challenge is this: I am sure that there is a better algorithm. I want to know what the best mathematical algorithm to solve this problem is. I would like opinions on my approach, and hopefully I will learn something. 0 Share this post Link to post Share on other sites
Antheus 2409 Report post Posted March 31, 2010 Quote:Original post by arbitusThere exists a series such that f(n) = n. f(n) is the sum total of all occurrences of the digit 1 within all numbers in the range 0 to n.Just so I'm clear on what is actually asked.- We have numbers 0..n (0, 1, 2, 3, 4, 5, 6, ....)- f(n) = number of times digit one appears in that sequenceBecause then I find it hard to spend even a second of time over writing: int total = 0; int test = 10000000; for (int i = 0; i < test+1; i++) { int d = i; while (d > 0) { if (d % 10 == 1) total++; d /= 10; } }It completes in a fraction of a second.But it doesn't answer f(n) == n.This one however: int total = 0; int count = 0; while (true) { int d = count; while (d > 0) { if (d % 10 == 1) total++; d /= 10; } if (total == count) Console.WriteLine(total); count++; }Does.Is this the sequence that is meant? A series in which each element is that for which f(n) == n? And the task is to find next element?Quote:The problem then goes on to list several elements in the sequence and your job was to find the next element in the sequence.Care to provide them for clarification? 0 Share this post Link to post Share on other sites
arbitus 440 Report post Posted March 31, 2010 Quote:Original post by AntheusBecause then I find it hard to spend even a second of time over writing:* snip *It completes in a fraction of a second.That's the brute force approach. I wanted, as an exercise, to see if I could conceptualize something better than the brute force approach. This version is certainly faster than the brute approach I originally took, but it still has the same complexity.Quote:But it doesn't answer f(n) == n.Right, I originally stated that I do not care about this part of the question; I only cared about looking at an alternative algorithm for computing f(n).Quote:Care to provide them for clarification?I don't have the numbers on hand because I have no interest in that particular part, but if you need them I can provide them. 0 Share this post Link to post Share on other sites
Ylisaren 100 Report post Posted March 31, 2010 A byte can be in 256 different configurations, for those 256 combinations we will have as many zeroes as we have ones, so 256 * 8 / 2 = 1024 ones. The interesting thing to note here is that this also happens to be the number of 1s which will appear between 0 - 9999 9999.EDIT: I might also just be totally off as I just took a glance and had a hunch :P 0 Share this post Link to post Share on other sites
Ylisaren 100 Report post Posted March 31, 2010 So, along the same lines I think this is a solution if I've understood the problem correctly. ( NumberOfDigitsInN^2 * NumberOfDigitsInN ) / 2 = NumberOfOnesInZeroToN.EDIT: nvm, logic fail, going to verify next time :P 0 Share this post Link to post Share on other sites
arbitus 440 Report post Posted March 31, 2010 Quote:Original post by YlisarenSo, along the same lines I think this is a solution if I've understood the problem correctly. ( NumberOfDigitsInN^2 * NumberOfDigitsInN ) / 2 = NumberOfOnesInZeroToN.I am not exactly sure you have understood the problem correctly. Perhaps you can look at Antheus's brute force algorithm to get a quick idea.For example if n is 1, the answer is 1, as the digit 1 appears once from 0-1.If n is 10, the answer is 2, as 1 appears in the number 1 and the number 10 from 0-10.If n is 11, the answer is 4, as 1 appears in the number 1, number 10, and twice in 11 from 0-11.If n is 12, the answer is 5If n is 20, the answer is 12etc. 0 Share this post Link to post Share on other sites
Ylisaren 100 Report post Posted March 31, 2010 Yeah, I've understood it correctly, my solution was wrong however as I was thinking in base 2. But you can use the same logic for base 10. For example, if you take the range 0 - 9999 there's 10 000 configurations. Out of those different configurations you will have 4 digits, and the chance of one of them being 1 is 1 / 10. So to solve it you then have ( pow( 10, log10( n ) ) * log10( n ) ) / 10. This will report the correct result for f( 9999 ), but naturally won't for all n. 0 Share this post Link to post Share on other sites
arbitus 440 Report post Posted March 31, 2010 Quote:Original post by YlisarenYeah, I've understood it correctly, my solution was wrong however as I was thinking in base 2. But you can use the same logic for base 10. For example, if you take the range 0 - 9999 there's 10 000 configurations. Out of those different configurations you will have 4 digits, and the chance of one of them being 1 is 1 / 10. So to solve it you then have ( pow( 10, log10( n ) ) * log10( n ) ) / 10. This will report the correct result for f( 9999 ), but naturally won't for all n.Yeah, we found a generalized equation for all powers of 10. I was trying to use this in a "divide and conquer" recursive strategy to apply it to any 'n' but it became monstrously complex and I gave up on that approach. As I said, I am quite confident that there is a mathematical approach given the probabilistic nature of the problem. Unfortunately, I have been unable to come up with it myself (hence the completely different strategy employed by my algorithm), but it would be interesting to also exam the difference in complexities and potential optimizations such as parallel execution, etc.Edit: I, by no means, believe that I have come up with anything more than just an interesting approach that is probably not unique in the slightest. I am simply proud that I worked it out from scratch in a couple of hours and I am now interested in the "real" answer and how it compares from an algorithmic standpoint. 0 Share this post Link to post Share on other sites
Antheus 2409 Report post Posted March 31, 2010 Quote:Original post by arbitusThis version is certainly faster than the brute approach I originally took, but it still has the same complexity.Except that you haven't solved the brute force problem."Given number n, determine smallest m (m > n), so that f(m) = m."Your version is exactly the same as mine. It is (m-n) * log10(m). Approximately.The brute force part is still there - you optimized the wrong problem, and even made it worse, since it has higher constant factor.The trick here is to determine how to improve iteration, not how to calculate individual n, nobody cares about that.The brute force version is approximately n log n. The improved calculation doesn't change that, it merely changes how log(n) part is calculated using different log(n) algorithm. 0 Share this post Link to post Share on other sites
arbitus 440 Report post Posted March 31, 2010 Quote:Original post by AntheusExcept that you haven't solved the brute force problem.I am having a hard time approaching a response to your post without a bit of defensiveness or snarkyness, as it seems you are missing the point.I have stated twice already that I am not interested in solving f(n) = n, or finding any or part of the sequence. That is completely irrelevant to what I am wanting to discuss in this thread, other than being context in the story for a problem that I am interested in attacking, which is a more efficient algorithm for the implementation of f(n) than the brute force approach. I have not, nor ever been interested in the context of this thread in optimizing the algorithm for finding the sequence where f(n) = n.Quote:The trick here is to determine how to improve iteration, not how to calculate individual n, nobody cares about that.I do, in fact, that is the only thing I care about in the context of this thread. The reason is because it is trivial to do both brute force on the computation of f(n) and the computation of the series where f(n) = n. I decided I wanted to investigate further the computation of f(n). 0 Share this post Link to post Share on other sites
Zipster 2359 Report post Posted March 31, 2010 Consider an arbitrary 'n' like 53941 and the sequence 0..53941. There will be a '1' in the one's place 5395 times (ceil(53941 / 10) * 1), a '1' in the ten's place 5400 times (ceil(53941 / 100) * 10), a '1' in the hundred's place 5400 times (ceil(53941 / 1000) * 100), a '1' in the thousand's place 6000 times (ceil(53941 / 10000) * 1000), and a '1' in the ten thousand's place 10000 times (ceil(53941 / 100000) * 10000). This gives us f(n) = 5395 + 5400 + 5400 + 6000 + 10000 = 32195. I confirmed that this is the correct answer using the brute force approach given by the OP, however I could have also gotten lucky and picked a value for 'n' that didn't expose any edge cases. Other than that, it's O(log10(n)) and each step is pretty darn simple. 0 Share this post Link to post Share on other sites
arbitus 440 Report post Posted March 31, 2010 Quote:Original post by ZipsterConsider an arbitrary 'n' like 53941 and the sequence 0..53941. There will be a '1' in the one's place 5395 times (ceil(53941 / 10) * 1), a '1' in the ten's place 5400 times (ceil(53941 / 100) * 10), a '1' in the hundred's place 5400 times (ceil(53941 / 1000) * 100), a '1' in the thousand's place 6000 times (ceil(53941 / 10000) * 1000), and a '1' in the ten thousand's place 10000 times (ceil(53941 / 100000) * 10000). This gives us f(n) = 5395 + 5400 + 5400 + 6000 + 10000 = 32195. I confirmed that this is the correct answer using the brute force approach given by the OP, however I could have also gotten lucky and picked a value for 'n' that didn't expose any edge cases. Other than that, it's O(log10(n)) and each step is pretty darn simple.This is almost identical to one of the approaches I was taking, but your algorithm only works if none of the interior or leading digits are less than or equal to 1. For instance, try it with a leading 1 instead of 5, or change the 9 to a 1 or 0. It was refining this approach that lead me to what I have in the OP. If fact, this is very close to what I am doing, only I handle these edge cases. 0 Share this post Link to post Share on other sites
ibebrett 205 Report post Posted March 31, 2010 your above approach works better if you take floor((num-1)/10) + 1. for the first answer. then do the same for the rest. floor((num-1)/100) + 1 0 Share this post Link to post Share on other sites
arbitus 440 Report post Posted March 31, 2010 Quote:Original post by ibebrettyour above approach works better if you take floor((num-1)/10) + 1. for the first answer. then do the same for the rest. floor((num-1)/100) + 1I am not quite following you, can you elaborate?Additionally, I understand that the problem is pretty trivial. But:1) Deriving your own solution for a trivial problem can be fun2) I want to know if there is a solution with less complexity/growth rate 0 Share this post Link to post Share on other sites
ibebrett 205 Report post Posted March 31, 2010 yeah it has log_10(n) complexity. just like the guy above said, there are how many 1's in the 1's place? floor((n-1)/10) + 1. what about in the 10's place floor((n-10)/100) + 1. etc...so it pretty much has a very low complexity. you only need to divide up to log_10(n).. 0 Share this post Link to post Share on other sites
arbitus 440 Report post Posted March 31, 2010 Quote:Original post by ibebrettyeah it has log_10(n) complexity. just like the guy above said, there are how many 1's in the 1's place? floor((n-1)/10) + 1. what about in the 10's place floor((n-10)/100) + 1. etc...so it pretty much has a very low complexity. you only need to divide up to log_10(n)..Your solution is still incorrect unless you are leaving something out of your post. It gives me the same solution as Zipster and is still incorrect in the case of an interior or leading 1 or 0. 0 Share this post Link to post Share on other sites
ibebrett 205 Report post Posted March 31, 2010 import mathdef ones(n): ret = 0 for i in range(int(math.log10(n)) + 1): ret = ret + int(math.floor((n-1)/10)) + 1 n = n / 10 return retwait hold on , you might be right. 0 Share this post Link to post Share on other sites
Zipster 2359 Report post Posted March 31, 2010 Quote:Original post by arbitusThis is almost identical to one of the approaches I was taking, but your algorithm only works if none of the interior or leading digits are less than or equal to 1. For instance, try it with a leading 1 instead of 5, or change the 9 to a 1 or 0. It was refining this approach that lead me to what I have in the OP. If fact, this is very close to what I am doing, only I handle these edge cases.Ah, very true, the math does make certain assumptions. However as you mentioned you can easily special-case '0' and '1' -- I believe that instead of doing (ceil(x) / 10^{N}) * 10^{N-1}, you can do x - (int(x / 10^{N}) * 10^{N}) + '0'/'1', which is just a fancy way of truncating the '1' and everything to the left of it, adding 1 if the digit was a '1' and 0 if the digit was a '0', and using that result (so 13941 becomes 3942 when evaluating the leading digit and the math works out again; 1395 + 1400 + 1400 + 3941 + 2000 = 10137). Again, could be more edge cases I'm missing. 0 Share this post Link to post Share on other sites
ibebrett 205 Report post Posted March 31, 2010 sorry guys i got it wrong but i am working out the right answer as we speak. its a little more complicated but its not too bad.. im just braking the number up like say...142 9 841 0 Share this post Link to post Share on other sites
ibebrett 205 Report post Posted March 31, 2010 got it, i will post the solution in after i write it up and verify (with reasoning). 0 Share this post Link to post Share on other sites
arbitus 440 Report post Posted March 31, 2010 Quote:Original post by Zipster Again, could be more edge cases I'm missing.This is quickly beginning to look like my original solution (check the OP). It solves in Log10(n) time and is pretty much the same math, just broken up a bit. I might be doing an unnecessary piece of algebra, as I have not combed over it for redundant operations to preserve the steps I took to arrive at it.Edit: I was hoping to find a way to mathematically eliminate the logical statements (if statements) and reduce it to a formulaic form. Perhaps it isn't possible? 0 Share this post Link to post Share on other sites
ibebrett 205 Report post Posted March 31, 2010 import mathdef digit(n, k): return n/(10**(k)) % 10def right(n, k): if k == 0: return 0 return n % 10**(k)def left(n, k): return n / 10**(k+1) def ones(n): ret = 0 for i in range(0, int(math.log10(n)+1)): pow10 = 10**i d = digit(n, i) r = right(n, i) l = left(n, i) ret = ret + l*(pow10) if d == 1: ret = ret + r + 1 if d >= 2: ret = ret + pow10 return retOk, i am pretty sure this is right. In addition it does run in log10 time, so its still fast, but your right there are a few cases that ake this ugly. it may be possible to do better, but remember the number of 1's up to a number is not a natural mathematical concept, its a side effect of our way of concieving them, so don't wory about it being too pretty. thanks for the cool problem ... if this is wrong, let me know.EDIT: Fixed mistake[Edited by - ibebrett on March 31, 2010 9:59:17 PM] 0 Share this post Link to post Share on other sites
arbitus 440 Report post Posted March 31, 2010 Can't validate it right now, but check my algorithm in the OP and see if it matches yours. At first glance it looks very close.Edit: Not the first brute force one, but the second one posted. 0 Share this post Link to post Share on other sites
ibebrett 205 Report post Posted March 31, 2010 by the wayhttp://www2.research.att.com/~njas/sequences/?q=1,1,1,1,1,1,1,1,1,2,4,5,6,7,8,9&sort=0&fmt=0&language=english&go=Searchcheck this database outwe can validate against it.and i fixed my solution above, i am 99% its right now (which i was before of course..) 0 Share this post Link to post Share on other sites
arbitus 440 Report post Posted March 31, 2010 Quote:Original post by ibebrettby the wayhttp://www2.research.att.com/~njas/sequences/?q=1,1,1,1,1,1,1,1,1,2,4,5,6,7,8,9&sort=0&fmt=0&language=english&go=Searchcheck this database outwe can validate against it.and i fixed my solution above, i am 99% its right now (which i was before of course..)Yes, your solution looks identical to my solution (2nd code block in the op). I think it is probably the best one. The reason I was thinking there might be a mathematical solution is because this problem is based on probability. However, it does seem that the probability is spatial (relates to digit position) and cannot be solved as a set. Ah well, it was indeed fun to solve. 0 Share this post Link to post Share on other sites