11 replies and 3 of them you post about OpenCL and Spir-V, although it's really not relevant to the request for a cross platform compute solution targeting XBox and PS4. I think people don't vote you down because they dislike you, it's rather because you're off topic. I agree the down-vote is sometimes mean, because you don't know why people do it, but the point is not that you'll get used to it and continue posting. It's probably more of advantage for you if you deduce why it happens.
I'd wish people would be man/women enough to always tell why they voted down, random punishment does not help or lead to anything.
Assuming they're all 32-bit floats; the Z component of a vector is 16 bytes away from its X component. This means when you load the X component of a vector, you'll be loading its Y & Z in the same cache line (most x86 CPUs use 64-byte cache lines; some rare ARM devices use 32-byte caches though).
The approach doesn't scale well to higher width SIMD (i.e. AVX-512) unless the standard cache line size increases as well (which AFAIK, doesn't); however it's still an improvement over the original SoA which will always need 3 lines per Vector3.
It's SIMD 4 register friendly, but not very cache friendly. a tuple of 4 vectors is 24byte in size, thus sometimes the x y z components are crossing cache line borders, and in that case you could just as good use pure SoA.
But I agree that using hybrids makes sense. That whole AoS and SoA is not a guide to how you have to do it. It should rather open your eyes that you can layout data completely different than a real world logical view of the data would suggest. The way to start should not be "how do I layout the data", but "How am I going to use this data" and then the "hybrid" comes into play, because you'll organize data in a way that makes sense. "Sense" doesn't mean strictly for performance.
-If you have some complex structures and it's not critical to performance, then it makes sense to organize those the best way for your maintaining, this way you will safe your time and you can spent this time on optimizing critical parts.
-if you now have a piece of code that is critical, try to figure out what access pattern you will have. Try to figure out what ranges the data will be that you use and what quality you need. e.g. if you do all your heavy math on colors, those might not need to be float. You could have those in memory as 8bit/channel or as halfs. You effectivelly trim unneded bits from your variables and thus become cache friendlier, memory bandwidth friendlier etc.
And most importantly, especially if you are a beginner, don't assume what is slow and what is fast. Implement working solution and profile it, you will be surprised how often the things you thought would be slow are not the bottleneck and how often parts of the binary are slow that you haven't assumed. As a next step try to analyse why it is slow, don't fall in a trap like "there is a division, divisions are slow", it might be that the division operation first fetches data from memory and stalls for it, that might take way more cycles than a division. The same another way around, some fetches for random memory might be hidden by the cpu pipeline, don't immediately assume that's the problem, your compile might create weird opcode for innocent looking code.
And don't hesitate to ask senior programmers, you will see that every of them will tell you another reason your code is probably slow and another solution for it, this is a simple proof that profiling is the propper way to decide... that's also the case for AoS vs SoA vs hybrid solutions.
thanks tom, sadly there isn't much information, rather opinions. I'd really appreciate if someone could share some knowledge or reference to sources.
and also how it is handled in uk, france, germany or sweeden
it makes me always sad to see how some people and companies try to handle languages and libs like religions
I like c++, yet for WP7 and Xbox-Indie I'd be forced to one particular language, although there is no reason something else wouldn't run. just like they've tied DX10 to Vista and a lot of morons spread the word "it's for technical reason, superior driver model and..." while every programmer knows that an API is just a thin interface, not really related to a driver model or internal implementations (that's why there is an interface).
While I'm sad for you XNA guys if it would be really true that it 'dies", I hope people learn from it, to not stick to one language, to one api, one lib, especially if it's build solely by one company and their marketing department.
Standards are the way to keep your freedom of choosing what and how you want to develop.
In general, you should prefer methods that do not cause side effects.
do you imply that a function that is supposed to calc something based on members, is having side effects although it's on purpose? otherwise I'm confused what you could possibly mean, could you make your reply more detailed?
that depends on the art and raytracing quality.
quite often you can combine both, rendering forward the usually ray to get a gbuffer and then use the gbuffer and raytrace all market pixels/fragments for reflections etc.
I have to disagree, bi-directional path tracer should have a faster convergence, if you have smaller lightsources. They are kind of bad for light domes (IBL).
I mean, the noise is mostly visible in the in-directional areas, it shouldn't be like that. You might end up with some fire flies if you add specular surfaces, tho.
your GPU just executes one render job at a time, spreading it to multiple rendertargets will rather make it slower than faster, as it has to switch at least twice. also notice that you cannot just use two threads to render with one device, you need to create a deferred context with DX11, which just records your commands, once you're done, you still submit it from the main context, so it won't make any difference, unless you are really CPU bound while creating your draw commands.
I think I understand how it's done now. If all of the triangles are CW or CCW the outward pointing normals can be calculated correctly.
My terrain is rendered using triangle_strips, which changes the ordering of the triangles for each row. So I think in order to calculate my normals I need to swap the way I calculate the normals each time I change rows (which changes CW/CCW ordering).
I intended to optimize my index buffer for gpu cache coherency in the future. I suppose I should look into that before I hard code how my normals are calculated. Any tips on that are appreciated =)
keeping the order is especially important to have backface culling running, this saves you 50% of the rasterized pixel, low hanging fruit everyone should use (except on PS2 )