Jump to content
  • Advertisement

HellRaiZer

Member
  • Content count

    663
  • Joined

  • Last visited

Community Reputation

1001 Excellent

About HellRaiZer

  • Rank
    Advanced Member

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. HellRaiZer

    Useless Snippet #2: AABB/Frustum test

    Everyone, thanks for the constructive comments. I've updated the article based on some of your suggestions.   @Matias Goldberg: Unfortunately, changing _mm_load_ps((float*)&absPlaneMask[0]) to be standard compliant as you suggested, as well as adding the __restrict keyword, requires a rerun of the benchmarks, because otherwise the numbers won't be accurate. I'll keep a note and hope to be able to do it sometime soon.   One small note about your last comment. The 4 AABBs at a time SSE version has a loop at the end which should handle the case where the number of AABBs isn't a multiple of 4. The code isn't shown for clarity (it should actually be the same as the 1 AABB at a time version).   Thanks again for the great input.
  2. HellRaiZer

    Useless Snippet #2: AABB/Frustum test

    @zdlr: Of course you are right. I forgot about those two. I'll edit the article to read "C++" instead of "C".   @Matias: Thank you very much for the tips. I'll try to find some time to do the changes you suggest and edit the article. I'll also try to make a little VS project to attach to the article at the same time.    About the two dot products. Do you mean method 5 on that article? In this case, the resulting code wouldn't be able to distinguish between fully inside and intersecting states which is a requirement for this article. I know, it might sound bad trying to optimize something and add arbitrary restrictions which might affect performance, but I think having the ability to distinguish between those two cases might help in the case of an hierarchy. E.g. parent is completely inside, so there's no need to check any of its children.   Please correct me if I'm wrong. 
  3. HellRaiZer

    Useless Snippet #2: AABB/Frustum test

    Thank you all for the comments.   @Servant: Bacterius is right. The numbers are "cycles per AABB". Culling a batch of 1024 AABBs in a single loop ends up being faster than 32 probably because the function overhead (e.g. calculating the abs of the plane normals) is minimal compared to the main loop.    Example: Assume that the initial loop which calculates the abs of the plane normals requires 200 cycles. Also, assume that each AABB requires 100 cycles. For a batch of 10 AABBs, the function would require 1200 cycles to complete. Or in other words, 120 cycles per AABB. If the batch had 1000 AABBs, the function would require 100200 cycles, or 100.2 cycles per AABB. Hope that makes sense.   @zdlr: Variable names were kept like that in order to match the examples in Fabian Giesen's article which I linked above. Also, the term 'reference implementation' doesn't mean that the code uses references. It means that, that specific snippet is used as a baseline for performance comparisons (if this was what you meant).
  4. HellRaiZer

    Shader "plugin" system?

    It has been a long time since I touched any rendering related code, but I'll try to describe what I remember from my implementation.   Each surface shader (e.g. a shader that will be applied to the surface of a 3D model) can use one or more different plugins.  Shader plugins were implemented using Cg interfaces.    So, your example above would look something like this (pseudocode since I haven't written a line of Cg for a couple of years now). IAmbientLighting g_AmbientLighting; sampler2D DiffuseTex; float4 mainPS(VS_OUTPUT i) { float4 diffuseColor; diffuseColor.rgb = tex2D(DiffuseTex, i.UV.xy).rgb; diffuseColor.rgb *= g_AmbientiLighting.CalcAmbientLight(i.PosWS.xyz); diffuseColor.a = 1.0; return diffuseColor; } The IAmbientLighting interface would look like this:  interface IAmbientLighting { float3 CalcAmbientLighting(float3 posWS); } Your current shader would have used a constant ambient color implementation. Something like:  class ConstAmbientLight : IAmbientLighting { float4 AmbientColor; float3 CalcAmbientLighting(float3 posWS) { return AmbientColor.rgb * AmbientColor.a; } } If you would like to change to an SSAO implementation, instead of using this class you would use: class SSAO : IAmbientLighting { sampler2D SSAOTex; float4 AmbientColor; float4x4 WSToSSMatrix; float3 CalcAmbientLighting(float3 posWS) { float2 screenSpacePos = TransformToSS(posWS, WSToSSMatrix); float ssao = tex2D(SSAOTex, screenSpacePos).r; return AmbientColor.rgb * AmbientColor.r * ssao; } With those two interface implementations available, the renderer is responsible for selecting the correct one at run-time, based on some criteria (user prefs, GPU caps, etc.) and linking it to all the surface shaders which use an IAmbientLighting object.   The idea can be extended to other things. E.g. different kind of lights (omni, point, directional) can be implemented as interfaces of one common ILight interface.    This way you can create (e.g.) a Phong shader with or without SSAO, using one or more lights of any type.    That's the basic idea. Hope it makes some sense. If not, just say it and I'll do my best to describe it better.
  5. Try creating one View.OnClickListener object and use the View.getID() on the passed view object.   Something like this: View.OnClickListener listener = new View.OnClickListener() { public void onClick(View v) { int id = v.getID(); switch(id) { case 1000: // button0 was clicked... break; case 1001: // button1 was clicked... break; } } } button0.setOnClickListener(listener); button1.setOnClickListener(listener); ... button0.setID(1000); button1.setID(1001); ... Hope that helps.
  6. HellRaiZer

    Is my frustum culling slow ?

    If the AABBs correspond to static geometry, translating them to world space every frame is an overkill. You should do it once at start up.   If it's about dynamic geometry, then it shouldn't be that simple when rotation is involved. If your objects rotate, you should calculate the AABB from the OBB defined by the original AABB and the object's transformation, in case you want to use the same code for all your objects. Otherwise you can find/write another function which culls OBBs against the frustum.    In case you go about the OBB route, it might be faster to just check the bounding sphere (which isn't affected by rotation) against the frustum, at the expense of rendering a few more objects (bounding spheres tend to be larger that AABBs depending on the object they enclose). 
  7. HellRaiZer

    Is my frustum culling slow ?

    @lipsryme If you get an access violation as soon as you add a 4th aabb in the list it means that your aabbList array isn't 16-byte aligned. Two choices here: 1) Explicitly allocate the aabbList array to be 16-byte aligned (e.g. using _aligned_malloc()) or 2) Change the 6 _mm_load_ps() calls with _mm_loadu_ps() which allow reading from unaligned addresses.   Hope that helps.   PS. To test if an array address is 16-byte aligned you can use one of the functions found here: http://stackoverflow.com/a/1898487/2136504 E.g. Check &aabbList[(iIter << 2) + 0].center.x and if it returns true but the _mm_load_ps() fails then something else is wrong with your array.
  8. HellRaiZer

    Is my frustum culling slow ?

    I believe the changes you made in the code is the problem. To be more exact:   The original code read the centers and extends of the 4 AABBs from an array with the following layout: c0.x, c0.y, c0.z, e0.x, e0.y, e0.z, c1.x, c1.y, c1.z, e1.x, e1.y, e1.z, c2.x, c2.y, c2.z, e2.x, e2.y, e2.z, c3.x, c3.y, c3.z, e3.x, e3.y, e3.z, ...   When the following instructions are executed, the XMM registers hold the values mentioned in their name:   // NOTE: Since the aabbList is 16-byte aligned, we can use aligned moves. // Load the 4 Center/Extents pairs for the 4 AABBs. __m128 xmm_cx0_cy0_cz0_ex0 = _mm_load_ps(&aabbList[(iIter << 2) + 0].m_Center.x); __m128 xmm_ey0_ez0_cx1_cy1 = _mm_load_ps(&aabbList[(iIter << 2) + 0].m_Extent.y); __m128 xmm_cz1_ex1_ey1_ez1 = _mm_load_ps(&aabbList[(iIter << 2) + 1].m_Center.z); __m128 xmm_cx2_cy2_cz2_ex2 = _mm_load_ps(&aabbList[(iIter << 2) + 2].m_Center.x); __m128 xmm_ey2_ez2_cx3_cy3 = _mm_load_ps(&aabbList[(iIter << 2) + 2].m_Extent.y); __m128 xmm_cz3_ex3_ey3_ez3 = _mm_load_ps(&aabbList[(iIter << 2) + 3].m_Center.z);   If we assume that the initial aabbList array is 16-byte aligned, all loads are 16-byte aligned and the instructions are executed correctly. This is the reason we are loading the XMM regs with those specific array offsets.   On the other hand, your code doesn't do the same thing. It just stores the AABBs on the stack and the layout isn't the one expected by the code. The best case scenario is that your layout is:      c0.x, c0.y, c0.z, c1.x, c1.y, c1.z, c2.x, c2.y, c2.z, c3.x, c3.y, c3.z, e0.x, e0.y, e0.z, e1.x, e1.y, e1.z, e2.x, e2.y, e2.z, e3.x, e3.y, e3.z   but: 1) I think you can't be 100% sure about that (e.g. that the compiler will place the centers before the extends) 2) It's not guaranteed to be 16-byte aligned. 3) Most importantly, it's not what the code expects.   If you have to read the AABBs the way you did (one element at a time) I would suggest something like this:   __declspec(align(16)) _Vector3f aabbData[8]; aabbData[0].x = ... // center0.x aabbData[0].y = ... // center0.y aabbData[0].z = ... // center0.z aabbData[1].x = ... // extend0.x ... And then use this array to load the XMM regs as in the original code snippet.   PS. If you try to understand what the code does with those SSE instructions, you might be able to "optimize" it and get rid of the loads and shuffles completely. This is in case you continue to read the AABB data the way you do it.
  9. HellRaiZer

    Is my frustum culling slow ?

      Now i am confused as in 4box version says:   // NOTE: This loop is identical to the CullAABBList_SSE_1() loop. Not shown in order to keep this snippet small.   where that part of code is that you mentioned.   What I meant is, that the calculations of (d+r) and (d-r) in the 4-box-at-a-time loop are correct.   When you substitute the comment you mentioned, with the loop from CullAABBList_SSE_1(), you have to fix the typo I mentioned, in order for it to be correct.   Hope that makes sense.
  10. HellRaiZer

    Is my frustum culling slow ?

    It's been a long time since my last reply here on gamedev.net.   @lipsryme: Happy to know that my blog post actually helped someone Unfortunately, there seems to be an error in the code. The culling should be incorrect. Since you haven't seen it yet I'd assume that you are just rendering more objects than needed.   The error is in: __m128 xmm_d_p_r = _mm_add_ss(_mm_add_ss(xmm_d, xmm_r), xmm_frustumPlane_d); __m128 xmm_d_m_r = _mm_add_ss(_mm_add_ss(xmm_d, xmm_r), xmm_frustumPlane_d);   Can you spot it? xmm_d_m_r should subtract r from d, not add it! it should be:   __m128 xmm_d_m_r = _mm_add_ss(_mm_sub_ss(xmm_d, xmm_r), xmm_frustumPlane_d);   I don't have the project anymore so I'd assume it's just a blog post typo and it didn't affect the timings.   On the plus side, the last piece of code in the post (4 boxes at a time) does it correctly   Hope this doesn't ruin your benchmarks.
  11. HellRaiZer

    Virtualized Scenes and Rendering

    I think you have to include the reply_id=XXXXXX part of the post. E.g. all links right now seem to point to the journal itself (http://www.gamedev.net/community/forums/mod/journal/journal.asp?jn=363003?). Btw, you don't need the last '?'. Change those to (e.g.) http://www.gamedev.net/community/forums/mod/journal/journal.asp?jn=363003&reply_id=3473003. (I hope the last link works when I press Reply, otherwise ignore me :) )
  12. HellRaiZer

    More D3D11

    I was just browsing Graphics Programming Methods on Google Books and I saw this article: Higher-Order Surfaces Using Curved Point-Normal (PN) Triangles. Of course it's not about DX11, so you probably don't need this. But since it's online I thought I'd share it. I haven't worked with PN triangles, but i guess the theory in the 2 of the 3 papers you posted should be enough (i don't expect the "marketing" one to very detailed). HellRaiZer
  13. HellRaiZer

    Terrain texturing

    Very interesting post indeed. I always wanted to try texture packs for texture splatting but the thought of manual texture filtering in the shader kept me from doing it. Quote: If I'm not mistken, this is also the technique used in Far Cry / Crysis. AFAIK FarCry used the technique you described. But from the little time i've spent with Crysis' editor i think they use another technique very similar to texture splatting. For distant terrain they use a prebaked texture (one per terrain patch i think) which can have arbitrary dimensions (higher resolutions are used in the playable area of the level, lower res for the distant terrain where you can't go). For close up terrain, they create one mesh per material (this is a mesh which holds only the terrain triangles which use the specified material) and add it on top of the base layer (the prebaked texture), instead of rendering the whole patch with alpha blending. I don't remember the details of the shader, but i think they also use your limitation for 1 layer per-vertex. Quote: For example, at the highest quality, shadow maps use 4 TMUs. I assume that by highest quality you mean PSSM with 4 4096x4096 shadowmaps. You have probably already said that in a previous journal post so forgive my memory. If you can get away with just 2048x2048 for the highest level, you can pack those into an atlas and use just 1 texunit (you would need one more for a 4x4 indirection texture, but the idea is that independently of the number of PSSM slices the texunits required are always 2). I imagine you've already checked that (or it has already been suggested by someone else), so sorry for repeating the obvious. Once again, thanks for this really interesting post. And forgive my English. HellRaiZer
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!