Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Member Since 15 Aug 2009
Offline Last Active Mar 24 2015 06:40 PM

#5218946 Cross platform GPU computation for real time ray tracing rendering engine!

Posted by ProfL on 24 March 2015 - 06:37 PM

I'm sorry, but

11 replies and 3 of them you post about OpenCL and Spir-V, although it's really not relevant to the request for a cross platform compute solution targeting XBox and PS4.
I think people don't vote you down because they dislike you, it's rather because you're off topic. I agree the down-vote is sometimes mean, because you don't know why people do it, but the point is not that you'll get used to it and continue posting. It's probably more of advantage for you if you deduce why it happens.

I'd wish people would be man/women enough to always tell why they voted down, random punishment does not help or lead to anything.

#5216345 Array of structs vs struct of arrays, and cache friendliness

Posted by ProfL on 13 March 2015 - 04:07 PM

You can also use hybrids, particularly for position data. For example the following layout is a cache friendly SoA:


xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz


Assuming they're all 32-bit floats; the Z component of a vector is 16 bytes away from its X component. This means when you load the X component of a vector, you'll be loading its Y & Z in the same cache line (most x86 CPUs use 64-byte cache lines; some rare ARM devices use 32-byte caches though).


The approach doesn't scale well to higher width SIMD (i.e. AVX-512) unless the standard cache line size increases as well (which AFAIK, doesn't); however it's still an improvement over the original SoA which will always need 3 lines per Vector3.

It's SIMD 4 register friendly, but not very cache friendly. a tuple of 4 vectors is 24byte in size, thus sometimes the x y z components are crossing cache line borders, and in that case you could just as good use pure SoA.

But I agree that using hybrids makes sense. That whole AoS and SoA is not a guide to how you have to do it. It should rather open your eyes that you can layout data completely different than a real world logical view of the data would suggest. The way to start should not be "how do I layout the data", but "How am I going to use this data" and then the "hybrid" comes into play, because you'll organize data in a way that makes sense. "Sense" doesn't mean strictly for performance. 

-If you have some complex structures and it's not critical to performance, then it makes sense to organize those the best way for your maintaining, this way you will safe your time and you can spent this time on optimizing critical parts.

-if you now have a piece of code that is critical, try to figure out what access pattern you will have. Try to figure out what ranges the data will be that you use and what quality you need. e.g. if you do all your heavy math on colors, those might not need to be float. You could have those in memory as 8bit/channel or as halfs. You effectivelly trim unneded bits from your variables and thus become cache friendlier, memory bandwidth friendlier etc.


And most importantly, especially if you are a beginner, don't assume what is slow and what is fast. Implement working solution and profile it, you will be surprised how often the things you thought would be slow are not the bottleneck and how often parts of the binary are slow that you haven't assumed. As a next step try to analyse why it is slow, don't fall in a trap like "there is a division, divisions are slow", it might be that the division operation first fetches data from memory and stalls for it, that might take way more cycles than a division. The same another way around, some fetches for random memory might be hidden by the cpu pipeline, don't immediately assume that's the problem, your compile might create weird opcode for innocent looking code.

And don't hesitate to ask senior programmers, you will see that every of them will tell you another reason your code is probably slow and another solution for it, this is a simple proof that profiling is the propper way to decide... that's also the case for AoS vs SoA vs hybrid solutions.

#5171625 spare time project IP

Posted by ProfL on 05 August 2014 - 08:11 AM

thanks tom, sadly there isn't much information, rather opinions. I'd really appreciate if someone could share some knowledge or reference to sources.
and also how it is handled in uk, france, germany or sweeden

#5116975 Convex hulls in modern games

Posted by ProfL on 14 December 2013 - 06:55 PM

I would say most modern engines collide on polygon level, but use various detaillevel of collision geometry. thats why you can buy objects with ccollision proxies in the unity asset store

#4983037 Is XNA dying and MS forcing to C++?

Posted by ProfL on 23 September 2012 - 04:42 PM

it makes me always sad to see how some people and companies try to handle languages and libs like religions :(
I like c++, yet for WP7 and Xbox-Indie I'd be forced to one particular language, although there is no reason something else wouldn't run. just like they've tied DX10 to Vista and a lot of morons spread the word "it's for technical reason, superior driver model and..." while every programmer knows that an API is just a thin interface, not really related to a driver model or internal implementations (that's why there is an interface).
While I'm sad for you XNA guys if it would be really true that it 'dies", I hope people learn from it, to not stick to one language, to one api, one lib, especially if it's build solely by one company and their marketing department.
Standards are the way to keep your freedom of choosing what and how you want to develop.

crossing fingers for you XNA fans :)

#4957868 OOP design question

Posted by ProfL on 10 July 2012 - 08:56 PM

In general, you should prefer methods that do not cause side effects.

do you imply that a function that is supposed to calc something based on members, is having side effects although it's on purpose?
otherwise I'm confused what you could possibly mean, could you make your reply more detailed?

#4957021 Hybrid Ray Tracing Feasability

Posted by ProfL on 08 July 2012 - 02:57 PM

that depends on the art and raytracing quality.
quite often you can combine both, rendering forward the usually ray to get a gbuffer and then use the gbuffer and raytrace all market pixels/fragments for reflections etc.

#4956138 Tedious bugs in my bidirectional path tracer.

Posted by ProfL on 05 July 2012 - 05:03 PM

I have to disagree, bi-directional path tracer should have a faster convergence, if you have smaller lightsources. They are kind of bad for light domes (IBL).
I mean, the noise is mostly visible in the in-directional areas, it shouldn't be like that. You might end up with some fire flies if you add specular surfaces, tho.

I'm not sure what your error is, but if both images converge to different result, it's clearly wrong. My wild guess would be that something is wrong with your probability calculations. check out:
and additionally

#4956130 Is rendering to a texture faster than rendering to the screen?

Posted by ProfL on 05 July 2012 - 04:49 PM

your GPU just executes one render job at a time, spreading it to multiple rendertargets will rather make it slower than faster, as it has to switch at least twice. also notice that you cannot just use two threads to render with one device, you need to create a deferred context with DX11, which just records your commands, once you're done, you still submit it from the main context, so it won't make any difference, unless you are really CPU bound while creating your draw commands.

#4943302 [Theory] Unraveling the Unlimited Detail plausibility

Posted by ProfL on 25 May 2012 - 12:21 PM

there are some interesting paper about this, indeed.
best so far that I've found: http://research.microsoft.com/en-us/um/people/hoppe/perfecthash.pdf

#4857749 Calculating outward pointing normals

Posted by ProfL on 05 September 2011 - 03:21 AM

I think I understand how it's done now. If all of the triangles are CW or CCW the outward pointing normals can be calculated correctly.

My terrain is rendered using triangle_strips, which changes the ordering of the triangles for each row. So I think in order to calculate my normals I need to swap the way I calculate the normals each time I change rows (which changes CW/CCW ordering).

I intended to optimize my index buffer for gpu cache coherency in the future. I suppose I should look into that before I hard code how my normals are calculated. Any tips on that are appreciated =)

keeping the order is especially important to have backface culling running, this saves you 50% of the rasterized pixel, low hanging fruit everyone should use (except on PS2 :D)