• 15
• 15
• 11
• 9
• 10

Programming Style, size_t

This topic is 3083 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

Following I have posted a segment of C99 code, implementing a slightly optimized form of the Sieve of Eratosthenes. I am specifically looking for details on its style and means of presentation, especially from a different eye. Specifically:
• Is it readable? How long did it take to figure out what it does (and its implementation)?
• Do the comments explain what needs to be explained? Is there anything missing? Should it be rephrased?
• Is any of it at all misleading?
Also: semantically, should size_t be preferred to int, as it is the type of the result of the sizeof operator, and is an unsigned (as opposed to signed) integer? However, int allows use of -1 as a does-not-exist index, but splits the effective maximum array size in half. Comments? (Yes, I know the "%i" in printf should be "%z"; gcc doesn't support all of C99 yet)
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <stdint.h>

// Calculates and prints primes from 2 to MAX_NUM
// Finds primes with the Sieve of Eratosthenes
int main()
{
const size_t MAX_NUM = 274;
const size_t SQRT_MAX_NUM = (size_t)ceil(sqrt(MAX_NUM));
const size_t K_MEM_SIZE = (MAX_NUM - 3) / 16 + 1;

// Boolean bit array for numbers 3, 5, 7, 9, 11, 13, ...
uint8_t K[K_MEM_SIZE];

printf("MAX_NUM = %i\n", MAX_NUM);
printf("SQRT_MAX_NUM = %i\n", SQRT_MAX_NUM);
printf("K_MEM_SIZE = %i\n", K_MEM_SIZE);
printf("\n");

// Initialize K
for (size_t i = 0; i < K_MEM_SIZE; ++i)
K = 0xFF;

// Sieve!
for (size_t n = 3; n < SQRT_MAX_NUM; n += 2)
{
// Is n prime?
if ( K[(n - 3) / 16] & (0x01 << ((n - 3) / 2) % 8) )
{
for (size_t j = n * n; j <= MAX_NUM; j += n + n)
{
// Set j to not prime
K[(j - 3) / 16] &= ~( 0x01 << ((j - 3) / 2) % 8 );
}
}
}

// Print primes "2, 3, 5, 7, 11, ..."
printf("2");
for (size_t n = 3; n <= MAX_NUM; n += 2)
{
if ( K[(n - 3) / 16] & (0x01 << ((n - 3) / 2) % 8) )
printf(", %i", n);
}
printf("\n");

return EXIT_SUCCESS;
}



Share on other sites

• Depends what you refer to by "it". You said yourself what the code as a whole does; it's the Sieve of Eratosthenes. However, since I don't know the working details of the algorithm, it brings me to...

• ... the comments; they convey almost no useful information at all. In order to understand your code, you need to know how the Sieve of Eratosthenes actually works. All you say is you initialize K, but for someone who don't know the Sieve, I don't know what the initialization means in terms of behavior of the algorithm. And then you perform "Sieve!", which is expected, since it's the Sieve of Eratosthenes. A quick search for what the sieve is reveals it is a prime finding algorithm (I know that much at least) for someone who don't even know that, so second comment in the loop isn't unexpected either and also somewhat useless; "Is n a prime?". But what is the magic of finding if it is a prime of not; there is an awful lot of bit-magic there.

What I'm getting at here is; describe algorithm as a whole and deeper intent with code, not just what a particular line is actually doing in terms the algorithm itself. You need to understand the algorithm in order to understand the algorithm here. I don't understand anything of the loops, because I don't know how the algorithm works. In a year, you may have forgotten all the details, and you are in the same situation when you look back on the code.

• Given the comments, I wouldn't say things are misleading, because they don't lead me anywhere at the moment.

Nothing wrong with the code or coding style, but the comments, or documentation for the code, really doesn't bring anything.

edit: A style comment after all; someone is, sooner or later, going to comment on your use of capital names for symbols. Some people suggests they are for preprocessor definitions only. Partly agree, but you're using them for constants so personally I could approve it, but for constants only.

Share on other sites
Quote:
 Original post by Brother Bob... the comments; they convey almost no useful information at all. In order to understand your code, you need to know how the Sieve of Eratosthenes actually works. All you say is you initialize K, but for someone who don't know the Sieve, I don't know what the initialization means in terms of behavior of the algorithm. And then you perform "Sieve!", which is expected, since it's the Sieve of Eratosthenes. A quick search for what the sieve is reveals it is a prime finding algorithm (I know that much at least) for someone who don't even know that, so second comment in the loop isn't unexpected either and also somewhat useless; "Is n a prime?". But what is the magic of finding if it is a prime of not; there is an awful lot of bit-magic there.

Off-topic, but personally, I wonder if it isn't ok if you mention the name of an algorithm without explaining it? If you mention that you use an algorithm documented elsewhere (I'm sure this sieve will be well documented on the internet and many academic examples), isn't it enough to mention the name of it and assume that another user either knows it or can look it up?

Share on other sites
Quote:
Original post by Lode
Quote:
 Original post by Brother Bob... the comments; they convey almost no useful information at all. In order to understand your code, you need to know how the Sieve of Eratosthenes actually works. All you say is you initialize K, but for someone who don't know the Sieve, I don't know what the initialization means in terms of behavior of the algorithm. And then you perform "Sieve!", which is expected, since it's the Sieve of Eratosthenes. A quick search for what the sieve is reveals it is a prime finding algorithm (I know that much at least) for someone who don't even know that, so second comment in the loop isn't unexpected either and also somewhat useless; "Is n a prime?". But what is the magic of finding if it is a prime of not; there is an awful lot of bit-magic there.

Off-topic, but personally, I wonder if it isn't ok if you mention the name of an algorithm without explaining it? If you mention that you use an algorithm documented elsewhere (I'm sure this sieve will be well documented on the internet and many academic examples), isn't it enough to mention the name of it and assume that another user either knows it or can look it up?

You can, just like he did just before the main function, except possibly including a link to the documentation as well. But the bit-magic in the loops needs to be explained on a higher level (unless, of course, the linked documentation contains exactly that); why all the subtractions, shifts, divisions and masking? Those are the first questions that will surface in a year, and those are the details that makes the whole algorithm in this case.

Share on other sites
Like the other posters I feel that the comments left me none the wiser. Far too many people have completely misunderstood the point of code comments. Whenever possible, document your code with code. That is, use descriptive names, break functions up into smaller ones and give them names that indicate their purpose. In general, well-written code is much easier to read than well-commented code.

Here are a couple of articles written by the highly respected OO expert Robert C. Martin ("Uncle Bob"). Incidentally, they are about refactoring code designed to compute the sieve of Eratosthenes:

- Part I
- Part II

Note: these articles are written with Java programmers in mind, but most of the ideas translate easily to C/C++.

Share on other sites
I personally would not use size_t like you do. None of your base numbers are really the size of anything. The fact that MAX_NUM is eventually manipulated into being the size of something doesn't mean that it has to be a size_t itself.

For this toy sample the explicit bit array is overkill. An array of boolean would be much more readable. If you want to keep the bit array then Set/TestBit helper functions would probably improve things. Even with the bit array I would just start it at index 0 instead of 3 so that you don't have to have all the "j-3"'s. In a similar vein, the "/ 16"'s look wrong at first, you should definitely explain that you're only have bits for odd numbers.

Otherwise I agree with the other commenters. I find it to be generally readable, given that I'm already familar with the Sieve of Eratosthenes and bit twiddling.

Share on other sites
Quote:
 (Yes, I know the "%i" in printf should be "%z"; gcc doesn't support all of C99 yet)
Why is this not a comment?

Quote:
 int allows use of -1 as a does-not-exist index, but splits the effective maximum array size in half. Comments?
Why is this not a comment?

Quote:
 implementing a slightly optimized form
Why is this not a comment?

Quote:
 // Boolean bit array for numbers 3, 5, 7, 9, 11, 13, ... uint8_t K[K_MEM_SIZE];
This is something that would need to be documented. Perhaps:
// x86 optimization packs sieve into bits to improve utilization of cache

Quote:
 Here are a couple of articles written by the highly respected OO expert Robert C. Martin ("Uncle Bob"). Incidentally, they are about refactoring code designed to compute the sieve of Eratosthenes:

And this is how consultants earn their paycheck.

Sieve is a solved algorithm, and has been for, say, 2000 years. There is no need to refactor it in any way, or apply OO concepts.

As a matter of fact, no consultant should even think of teaching how to implement Sieve. Seriously. This is what a graduate *must* know (at very least by reading Wikipedia). And it is something no company should allow to be done on its time. It is prime example of a problem that doesn't need a solution.

Teaching this type of development is actually inviting developers to waste corporate time. Again, I'm dead serious. Primes are constant. They won't change. This means that any application that will be using them will load them from pre-computed tables, or from cryptographically secure prime number generator, or use primes indirectly through a third-party certified cryptography library. This applies to OO design and similar, not to OP.

Share on other sites
Quote:
 Original post by DinosaurDynastyYes, I know the "%i" in printf should be "%z"; gcc doesn't support all of C99 yet
GCC supports "%z" just fine, or at least it did last time I checked. I'm betting that you're using MinGW though which happens to rely on Microsoft's standard library for formatted I/O (it's a fine point but I believe in blaming the right people.)

Quote:
Original post by Antheus
Quote:
 Here are a couple of articles written by the highly respected OO expert Robert C. Martin ("Uncle Bob"). Incidentally, they are about refactoring code designed to compute the sieve of Eratosthenes:
And this is how consultants earn their paycheck.

Sieve is a solved algorithm, and has been for, say, 2000 years. There is no need to refactor it in any way, or apply OO concepts.
Did you read the "article"? I'm fairly certain that it's just a parody of all of those terrible introductions to object-orientation out there, you know the ones taught with inane toy problems just to be extra sure that no one might accidentally see any potential benefit in the method.
At least I hold out the hope that no one out there actually believes that the best way to write readable code is to split everything up into four-line helper functions.

Share on other sites
Quote:
 Original post by implicitI'm fairly certain that it's just a parody of all of those terrible introductions to object-orientation out there, you know the ones taught with inane toy problems just to be extra sure that no one might accidentally see any potential benefit in the method.

It is not a parody. Check the site, this is best practices from their consultancy.

As said, this is how money is made in enterprise world. A lot of money.

Share on other sites
implicit: As of version 4.3.3 (Ubuntu 4.3.3-5ubuntu4 to be exact) under Linux, gcc does not support %z. When I replace %i with %z, it gives me a compiler warning, and when it runs, only prints "%". Gcc 4.4 or 4.5 may currently support it, but the lasted version in the Jaunty repositories does not currently support it.*

Though, then again, the standard library might have something to do with it.*

Antheus, you hit the ball. For now on, if I would ever have to explain something about the code when posting it on GameDev (or anywhere else), it belongs in the code (good thought experiment, actually -- didn't think of it until now).

size_t, on a 32-bit machine, would allow one to look for all the primes between 2 and 2**32-1, instead of "just" 2**31-1, in its current implementation; however, this is all moot, as allocating 256 MiB from the stack just causes a segfault. The C99 standard is very vague on this, however; it just specifies that the index needs to be an "integer". size_t just seems more natural (as it deals with an index into memory; it also, I believe, gets larger with memory size, 64-bit on 64-bit machines, 128-bit on 128-bit machines, etc., which int doesn't always do...).

Quote:
 Original post by Anon MikeFor this toy sample the explicit bit array is overkill. An array of boolean would be much more readable. If you want to keep the bit array then Set/TestBit helper functions would probably improve things. Even with the bit array I would just start it at index 0 instead of 3 so that you don't have to have all the "j-3"'s. In a similar vein, the "/ 16"'s look wrong at first, you should definitely explain that you're only have bits for odd numbers.

The array of boolean is a lot more readable, I agree. The helper macros/functions are not a bad idea either. No, you can't start the array at index 0. 0 is even.

Here is a version of the source with some of your suggestions:
#include <stdlib.h>#include <stdio.h>#include <math.h>#include <stdint.h>// Calculates and prints primes from 2 to MAX_NUM// Finds primes with an optimized Sieve of Eratosthenesint main(){    const size_t MAX_NUM = 274;    const size_t SQRT_MAX_NUM = (size_t)ceil(sqrt(MAX_NUM));    const size_t K_MEM_SIZE = (MAX_NUM - 3) / 16 + 1;    // Packed boolean bit-array for odd numbers 3 to MAX_NUM    // By packing sieve into bits (instead of bytes),    // cache and memory is better utilized    uint8_t K[K_MEM_SIZE];    // if your version of gcc supports %z, use it    // instead of %i for size_t variables and constants    printf("MAX_NUM = %i\n", MAX_NUM);    printf("SQRT_MAX_NUM = %i\n", SQRT_MAX_NUM);    printf("K_MEM_SIZE = %i\n", K_MEM_SIZE);    printf("\n");    // Initialize K    for (size_t i = 0; i < K_MEM_SIZE; ++i)        K = 0xFF;    // Sieve!    for (size_t n = 3; n < SQRT_MAX_NUM; n += 2)    {        // If n has not already been marked as not prime, then it is prime        if ( K[(n - 3) / 16] & (0x01 << ((n - 3) / 2) % 8) )        {            for (size_t j = n * n; j <= MAX_NUM; j += n + n)            {                // j is not prime                K[(j - 3) / 16] &= ~( 0x01 << ((j - 3) / 2) % 8 );                        }        }    }    // Print primes "2, 3, 5, 7, 11, ..."    printf("2");    for (size_t n = 3; n <= MAX_NUM; n += 2)    {        if ( K[(n - 3) / 16] & (0x01 << ((n - 3) / 2) % 8) )            printf(", %i", n);    }    printf("\n");        return EXIT_SUCCESS;}

*EDIT: Actually gcc supports "%zi" just fine. You need "%zi", not just "%z". Thank you, snoutmate, for pointing that out.

[Edited by - DinosaurDynasty on October 14, 2009 4:23:25 PM]