Jump to content
  • Advertisement
Sign in to follow this  
AcidZombie24

cache, structs and arrays

This topic is 3714 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I find myself with this problem a lot i have a struct with lets say x, y, flags and p. I would use vector<struct> name; then iterate through the struct only checking for flag, or p, or x,y (together or separate) and i know its killing the cache bc its loading the entire struct, check on element, then going to the next struct. I rather have a struct and an array for each element to speed up iteration Now i would do struct { container<_T> x, y, flags, p; } /*obviously i wouldnt use the same type for flags and p but you get the idea */ Then every time i would add something i would do it for each. Is there a better way to do this? is there a special container i can use? In a dif app i have another struct. its xml so i dont know how many types it uses. In the struct i would have an array of types and an array of values. Lets say one type is 'creator'. I'd have one (valid) creator tag in the xml. and i read in 100 xml files so i had 100 structs with a type & val array. How should i sort the creator name in alphabetical order and what containers should i use?

Share this post


Link to post
Share on other sites
Advertisement
(Why does every person who posts a question about optimization gets the typical "do you know for sure you have a bottleneck here?" reply? Couldn't we put that in the forum FAQ and be done with it, then assume the OP is not a complete idiot and has already profiled his code -or is just curious- and jump on to the real question?)

Share this post


Link to post
Share on other sites
Quote:
Original post by Rattenhirn
How did you come to the conclusion that iterating over your vector "is killing the cache"?


Not the vector, but the structs (-->vector<struct>)
My thoughts were, if 1 struct was 64 bytes and i want only 4bytes, loading that struct would hurt my cache. even if not all of the struct (60bytes) are being loaded to the cache, it still not as good as an array or vector where all the bytes i want are beside each other. and it make sense to be able to iterate through my flags or x,y coords. I dont know how i should be doing it. doing something like struct { container<_T> x, y, flags, p; } is my current solution

Share this post


Link to post
Share on other sites
Quote:
i have a struct with lets say x, y, flags and p. I would use vector<struct> name; then iterate through the struct only checking for flag, or p, or x,y (together or separate) and i know its killing the cache bc its loading the entire struct, check on element, then going to the next struct.


This won't kill cache -- for the example you've given, your structure should easily fit into one active cache line at a time, meaning the old ones can be easily ejected. Memory bandwidth could theoretically be an issue, but you've clearly told us you're profiled a cache issue, not a bandwidth issue. You did profile, right?

Quote:
I rather have a struct and an array for each element to speed up iteration


This may "kill" cache, by necessitating a minimum of 1 active cache line per member in use, even if each of those members is only accessing a 4 bytes of a 64 or 128 byte cache line. Not horrible in and of itself, but it'll hurt if it pushes some of your loops out of L1 cache and into L2...

This is of course assuming std::vector or std::deque. For any non-semi-contiguous container such as std::list, your proposal is an instant cache raping loss and bad idea.

Even with semi-contiguous containers you'd need some pretty huge structures to make this kind of optimization a good idea. The requirement to allocate each separately is enough to almost certainly kill whatever performance you gain even for fairly large structures.

Share this post


Link to post
Share on other sites
No i didnt profile. I DLed intels profiler and couldnt find how to track cache

Also this situation happens in many of my projects, i was thinking ahead and dont have code (in a large project) to profile.

I might mean bandwidth, i am worried that loading extra data and iterating through a few K of structs would rape my cache

I was thinking of using vector since i dont need front, i may need to use insert and remove a lot so maybe i want deque?

>The requirement to allocate each separately is enough to almost certainly kill whatever performance you gain even for fairly large structures.

What did you mean by this?

Share this post


Link to post
Share on other sites
Maybe you should be looking at your algorithms, not data structures.

In your original post you said, you iterate over the whole container a lot, each time accessing different members.

Perhaps you can change your code, so that it iterates over the container less times, accessing multiple members.

Furthermore, those values you're checking need to be set somewhere. Maybe it's possible to use the information to iterate only over the relevant portions of the container.

And so on...

It's hard to give good advice without knowing what the problem is. I highly doubt that inefficient use of the cache is primary suspect.

Share this post


Link to post
Share on other sites
if you're using a multi core processor, this situation is begging for OpenMP use.

secondly, for the SoA (Structure of Arrays) approach your best bet is any container which will provide contiguous memory, i.e. vector, deque or a plain-old-pointer

I would suggest that the best compromise would be compacting the struct and not using SoA, eg.

struct { int x, y, flags, p; }, if you know the data limits you might be able to get away with the following:

struct { unsigned short x, y, flags, p; } which will overall reduce the size of the struct so that I can fit more into a single cache line.

does anyone know of a how to best exploit set association? I suspect that if the data isn't being dirtied, then eviction is free, hence could accessing memory which hits the same cache line consecutively (e.g. each iteration work on say 4 items of data, where each item's location is exactly cache-size bytes appart).. can anyone confirm this? (off to try a simple benchmark now..)

Share this post


Link to post
Share on other sites
Quote:
Original post by AcidZombie24
I rather have a struct and an array for each element to speed up iteration

Now i would do struct { container<_T> x, y, flags, p; } /*obviously i wouldnt use the same type for flags and p but you get the idea */

Then every time i would add something i would do it for each. Is there a better way to do this? is there a special container i can use?
Ah, the age old battle of Structure of Arrays VS Array of Structures...
AFAIK, the SoA technique can be more appropriate in some circumstances (such as when you want to iterate each record but only inspect one field), but in C/C++ it is usually a lot messier to code than AoS.

Personally, I'd go for the standard AoS method just because it produces nicer looking code ;)
However, using C++ it should be possible to create a data-type which is allocated internally as SoA but provides an interface that looks like regular AoS code.

Quote:
Original post by janta
(Why does every person who posts a question about optimization gets the typical "do you know for sure you have a bottleneck here?" reply? Couldn't we put that in the forum FAQ and be done with it, then assume the OP is not a complete idiot and has already profiled his code -or is just curious- and jump on to the real question?)
Yeah, same goes for legal discussions in every single "Help wanted" thread... "Assume good faith" hould be a rule ;D

[Edited by - Hodgman on July 21, 2008 11:38:00 PM]

Share this post


Link to post
Share on other sites
Here's my attempt at making SoA look like AoS:
//"Normal" C++ implementation:
namespace AoS
{
class Foo
{
public:
int& A() { return a; }
int& B() { return b; }
int& C() { return c; }
private:
int a, b, c;
};
class FooList
{
public:
FooList( unsigned int s ) : size(s), foo(new Foo[s]) {}
~FooList()
{
delete [] foo;
}
Foo& GetAt( unsigned int i )
{
if( i >= size )
throw std::out_of_range("oh noes!");
return foo;
}
private:
int size;
Foo* foo;
};
}

//Alternative implementation:
namespace SoA
{
class FooList;
class Foo
{
public:
Foo( FooList& d, unsigned int o ) : data(d), offset(o) {}
int& A();
int& B();
int& C();
private:
FooList& data;
unsigned int offset;
};
class FooList
{
public:
FooList( unsigned int s ) : size(s), a(new int[s]), b(new int[s]), c(new int[s]) {}
~FooList()
{
delete [] a;
delete [] b;
delete [] c;
}
Foo GetAt( unsigned int i )
{
if( i >= size )
throw std::out_of_range("oh noes!");
return Foo(*this,i);
}
private:
friend class Foo;
int size;
int* a;
int* b;
int* c;
};
inline int& Foo::A() { return data.a[offset]; }
inline int& Foo::B() { return data.b[offset]; }
inline int& Foo::C() { return data.c[offset]; }
}

//Usage:

int main( int argc, char **argv )
{
AoS::FooList aos(42);
AoS::Foo& aos_foo = aos.GetAt(0);
aos_foo.A() = 1337;

SoA::FooList soa(42);
SoA::Foo soa_foo = soa.GetAt(0);//N.B. - Foo is not a reference for the SoA version
soa_foo.A() = 1337;
}

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!