the P4 L2's cache line size is 128 bytes (
good cache info). So if you have fewer than 32 floats in your structure or your structures aren't contiguous, you're wasting any that you don't use from the L2's perspective. Since your sizes are fixed after initialization, there's no need for all the extra redirection. This data can just be copied into contiguous arrays.
So, here is probably what I would do. Leave initialization the way it is, then use an array of structs like:
struct tehstruct{ float a, b, c; float* d; int N;};
and after initialization, sum up all the values that comprise d, allocate a big block of that size, and then copy the values of d's initialization into that buffer and write tehstruct's N as you're copying a, b, c members.
So, tehstruct's are contiguous (and all same-sized so as to be sortable) and all the *d's are contiguous. And that will be your best L2 cache usage.
Actually, I would probably even make d a 16-bit offset from the top of the alloced d buffer, and N a 16-bit count, so the whole tehstruct fits in an even 16 bytes. But you should not do that if you value your friends :
Alternatively, you could just make tehstruct an even 128-byte size:
struct tehstruct{ float a, b, c; int N; float d[28];};
making the cost of sorting higher, but giving better locality during the computation... either way, if you don't need the flexibility of std::vector, and L2 cache is really a big problem, these are some options.