Sign in to follow this  

Programming Challenge 1

This topic is 3772 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Optimize the following code and give detailed reason.
// Initialize iloveseven to an array of 7s
int i, j;
int iloveseven[1024][1024];

for(i = 0; i < 1024; i++) for(j = 0; j < 1024; j++) iloveseven[j][i] = 7;

Hints: obviously the code has no logical problem. Try to think in a remote area of computer programming. EDIT: more hints: paging, int has size of 4 bytes EDIT2: Check answer on page 2 of this thread [Edited by - TimChan on August 15, 2007 8:43:17 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Dave
You could pre-increment for a start :P. Though the compiler most likely sorts that out.

Your answer is correct. But there is another answer.

Share this post


Link to post
Share on other sites
Optimize for what — space? Performance?

First thought (probably not what you're looking for, though; technically, I guess it also doesn't function in exactly the same way in that it isn't possible to overflow):


template <typename T> class FilledContainer
{
T fill;

public:
FilledContainer(T Fill) : fill(Fill)
{}

T operator[](unsigned long)
{
return fill;
}
};

// ...

FilledContainer<int> foo(7);

Share this post


Link to post
Share on other sites
Quote:
Original post by jpetrie
iloveseven[i][j] has better cache coherency, since i changes less frequently than j.

TheUnbeliever: That's cute, I like that.

So what exactly cache coherency means? Your answer is correct and expected but I think the explanation is incomplete.

Share this post


Link to post
Share on other sites
Quote:
Original post by InsaneBoarder234
*** Source Snippet Removed ***


for(i = -1; i < 1024; ++i) for(j = -1; j < 1024; ++j) iLoveseven[i][j] = 7;

Except that means that you die painfully immediately when you try to access iLoveseven[-1][-1]

Share this post


Link to post
Share on other sites
Obviously, more constraints on what you're allowed to do to the code to optimize it would have been helpful. Converting iloveseven[1024][1024] into [1024*1024] and using one loop would have been my choice -- solves the cache coherency issue, and I like 1D arrays better anyway (but that array dimension isn't a concern here).

These sorts of things are 99% of the time obnoxious micro-optimizations. Profile, profile, profile.

Quote:

cant you just memset that entire block of memory?

You'd want to use fill_n or similar, not memset, but yes, that also achieves the solution of improving cache coherency.

Quote:

or just iloveseven[1024][1024] = {7}; //can't remember if that works how I'm thinking it does

If there are few initializer clauses in an initializer list than there are members in the aggregate, the remaining members are value-initialized (that is, zerod). It won't work here, but if you wanted to zero-initialize the array it would.

Quote:

So what exactly cache coherency means?

It means you're performing the access in a coherent fashion, in this case linearly stepping across the entire region of memory instead of jumping about in 1024-element chunks. I appreciate that you're from Hong Kong, and English may not be your native language, but the term is generally fairly sufficiently understood. Would you like me to define cache, too?

Share this post


Link to post
Share on other sites
Quote:
Original post by TimChan
So what exactly cache coherency means? Your answer is correct and expected but I think the explanation is incomplete.
You should know, you're apparently the one applying for the job and using jpetrie's answers to sleaze past the interview.

Share this post


Link to post
Share on other sites
Quote:
Original post by Ravuya
Quote:
Original post by TimChan
So what exactly cache coherency means? Your answer is correct and expected but I think the explanation is incomplete.
You should know, you're apparently the one applying for the job and using jpetrie's answers to sleaze past the interview.


hehe

Share this post


Link to post
Share on other sites
Quote:
Original post by TheUnbeliever
Except that means that you die painfully immediately when you try to access iLoveseven[-1][-1]


Eeee... I think I read a while ago that ++i incremented i before it was used by the for loop and I just took it as that, it seems I was wrong though!

Share this post


Link to post
Share on other sites
Quote:
Original post by jpetrie
iloveseven[i][j] has better cache coherency, since i changes less frequently than j.

TheUnbeliever: That's cute, I like that.


It took me a couple of looks at iloveseven[i][j] and iloveseven[j][i] to even realize you changed something. Just saw the iloveseven[?][?] and assumed iloveseven[i][j]. *hits self for being stupid and not paying attention*

Share this post


Link to post
Share on other sites
Quote:
Original post by Ravuya
Quote:
Original post by TimChan
So what exactly cache coherency means? Your answer is correct and expected but I think the explanation is incomplete.
You should know, you're apparently the one applying for the job and using jpetrie's answers to sleaze past the interview.

LOL Of course no, I'm already a paid web programmer. And I don't think such question would appear in job interview since it is about low-level stuff or OSDev...

Share this post


Link to post
Share on other sites
Quote:

Eeee... I think I read a while ago that ++i incremented i before it was used by the for loop and I just took it as that, it seems I was wrong though!

++i behaves the way you think, that's not the problem. The problem is that the loop executes the body first before the increment statement is executed. You can still preincrement the iteration index, you just still want to have the index initialized to 0 to begin with so you don't execute the bodies with i and j equal to -1.

Share this post


Link to post
Share on other sites
Quote:
Original post by jpetrie
Quote:

cant you just memset that entire block of memory?

You'd want to use fill_n or similar, not memset, but yes, that also achieves the solution of improving cache coherency.


Indeed. Optimized for style:


const int size = 1024;
// Initialize iloveseven to an array of 7s
int iloveseven[size][size];

std::fill_n(iloveseven, size * size, 7);

Share this post


Link to post
Share on other sites
iloveseven.cpp :

int iloveseven[1024][1024] =
{
{ 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, ..., 7, 7, }
{ 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, ..., 7, 7, }
{ 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, ..., 7, 7, }
....
{ 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, ..., 7, 7, }
};

fin.

GENiloveseven.cpp :

// generate iloveseven.cpp
...

fin.

This is a fast iloveseven filling no ? May we consider that the asm code generate from iloveseven.cpp fill a 1024*1024 memory array fastly ?

Share this post


Link to post
Share on other sites
Quote:
Original post by Zahlman
Quote:
Original post by jpetrie
Quote:

cant you just memset that entire block of memory?

You'd want to use fill_n or similar, not memset, but yes, that also achieves the solution of improving cache coherency.


Indeed. Optimized for style:


const int size = 1024;
// Initialize iloveseven to an array of 7s
int iloveseven[size][size];

std::fill_n(iloveseven, size * size, 7);

I don't know, but fill_n could actually have been implemented as a loop (I suspect they use different strategies for different cases).
And then there are the optimizations done by the compiler. Relative small arrays may get inlined like:

array[0] = 7;
array[1] = 7;
array[2] = 7;

instead of a loop.

Share this post


Link to post
Share on other sites
Quote:

Relative small arrays may get inlined like:

Perhaps, but why worry?

Optimizing for what you think the compiler will generate in terms of machine code is fruitless. Unrolling the loop may or may not be done by the compiler, so it shouldn't be considered a reasonable "optimization" to attempt to the C++ code.

The only real sane optimization you can make is the observation that the algorithm in question suffers from poor locality (it doesn't traverse the storage linearly). Worrying about unrolling the loop or other things where you try to cajole the compiler into generating better assembly is problematic because they're not general-purpose solutions, whereas improving the algorithm is.

Besides, it's also worth nothing that if the array is small enough to be trivially manually inlined, it's probably small enough that the cache coherency issue isn't relevant. That applies mainly to large datasets.

Share this post


Link to post
Share on other sites
Quote:
Original post by jpetrie
Quote:

Relative small arrays may get inlined like:

Perhaps, but why worry?

I don't. Not in real life, at least.

It's a challenge, I have no idea how sane the solutions have to be. Nor do I know if it should be optimized for size or for speed (I suspect the latter), or whether the optimizations are allowed to be limited to a single platform. I merely indicating that the gains in performance (if taken to the extreme) can be different from what could be expected by reading the code.

Share this post


Link to post
Share on other sites

This topic is 3772 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this