masterbubu

Members
  • Content count

    143
  • Joined

  • Last visited

Community Reputation

181 Neutral

About masterbubu

  • Rank
    Member
  1. RGBA to Float - Percision

    As always very informative. Thank you all for the help!
  2. RGBA to Float - Percision

    Hi,   I do aware to the typo, but the code I'm using does not have it.
  3. HI,   I have a conflict with the well known approach for encoding/decoding float <->RGBA   The web is full with hlsl/glsl samples of doing it.   If have implemented the following hlsl (Taken from Unity ) in c++ and test it out: // Encoding/decoding [0..1) floats into 8 bit/channel RGBA. Note that 1.0 will not be encoded properly. inline float4 EncodeFloatRGBA( float v ) { float4 kEncodeMul = float4(1.0, 255.0, 65025.0, 16581375.0); float kEncodeBit = 1.0/255.0; float4 enc = kEncodeMul * v; enc = frac (enc); enc -= enc.yzww * kEncodeBit; return enc; } inline float DecodeFloatRGBA( float4 enc ) { float4 kDecodeDot = float4(1.0, 1/255.0, 1/65025.0, 1/16581375.0); return dot( enc, kDecodeDot ); } I'm normalizing all numbers into 8 bit range.      RGBA = 61,0,0,191 --> float values (divided by 255 )  [0.239215687,0,0,0.749019623] Encoding worked properly.   Then I start to higher the R component to 66 (float val = 0.247058824 ).   When encoding the 66,0,0,191 the result is wrong. The .A component received wrong value (0.0).   Obviously there is a precision lost, as when the code was tested with doubles, the problems was not happened.   My question: As this approach is so common, mostly used on deferred rendering for pack the normal's and depth into RGBA texture (32bit), how this problem avoided?   Am I missing something?      
  4. Shader Performances

    I will use 2 uvs set. Thank you all for the help.
  5. Shader Performances

    Hi,   Tnx for you answer.   The current target engine is Unity, So I'm bounded to 2 sets of uvs.   Using a math function that transform the first uv's to the second uvs will not work for all cases. (curved ones).   I cannot modify the models automatically without having the artists to take a look. And doing it for a massive amount of data is out of our bought).
  6. Shader Performances

    Hi,   Let my try explain it better. I have to make a decision which cannot be undone later. Also, I cannot really test the impact on performances at the current time, So I'm basing the decision based on articles I have read of (ATI/Nvidia) which says to try lower the amount of textures sampling and push more ALU ops. Also I do know this effect will be very common > 90% of the drawing. The artists will create a massive amount of data which cannot be fixed later on (Bought problem :( )   The problem: The run-time generated uv's are used for overlay effect ( extra texture that is blended by alpha channels ).   Options: A: Let the artist create the secondary uv channel via 3d app, and use them to sample a detail texture. or  B: Create the uv's at run-time, and pay the price of sampling the same texture 3 times (Tri-Planer).   Please note that B is not revertible. If later I will find that the effect harms the performances, I cannot "Bake" the data back to UV's.   I wanted to spare the secondary uv as it will give me more flexibility later on. For example I can use it for : Light-maps, AO, ... 
  7. Hi all,   I want to consult on a thing I have in mind.   I have a shader which does Triplanner overlay effect. ( Sample the same texture 3 times per each plane and interpolates the colors ). This shader was created in order to eliminate the use of secondary UV's ( As I wanted to spare it in case I'll need it in the future ).   The art pipeline will start soon, and I need to decide whether asking the artists to create a secondary uv's, or use the run-time generated one. I cannot measure the performances now. I do know the overlays will be very common ( 90% of the drawing objects ) and I think that Triplanner effect is a costly effect ( Correct me if I'm wrong ).    Any suggestion will be great.   Tnx,  
  8. Shader performances

    Thank you very much. I will definitely use your tips when I'll have a running scene.
  9. Shader performances

    Very good answers.   I'm actually agreed with all the answers. I'm treating the GPU as a big train which must not be stopped for a small amount of passengers. Option 1 helps to reduce the amount of shader variants, which helps for batching ( And can always be divided into sub-shaders later on ). Level of detail will to reduce the shading Load.    However, all the shaders used on games ( that I've seen ) used the pre-processor option to reduce the work... so I'm still confused about that matter.  I remember OpenGL GDC lecture about zero driver over head, swapping shader was very expensive. so I'm still trying to figure why the majority picks option 2.
  10. Hi,   I'm interesting to know what is the preferred way to go regarding to shaders performances. ( Yes, it is matter of profiling, but is there a preferred way ? )   Option 1: having one shader that does the same work all the time, and at some cases may do "empty" work. For example the specular level is inputted from a texture. But in case we don't have a unique specular level pattern, the  inputted texture is white 1x1 pixels size texture.  I know that sampling texture can impact the performances, and according to some resources online, in modern GPUs it is preferred to use more ALU's then textures sampling, but, Is the texture size count? The plus side of this approach is that I'm reducing the amount of programs swap, which also contribute to the performances.   Option 2: Use pre-processors and split the work to many shaders. The GPU will work less, but we will have a lot of programs change.   Tnx,  
  11. OpenGL Frustum Culling - Need help

    Hi,   I found the problem, it was related to something else. tnx anyway 
  12. Hi,   I'm trying to implement frustum culling on opengl ES. I have read online tutorials    http://www.crownandcutlass.com/features/technicaldetails/frustum.html http://zach.in.tu-clausthal.de/teaching/cg_literatur/lighthouse3d_view_frustum_culling/index.html   and it looks straightforward. But has you can see on the video capture, it does not work well.   http://www.screencast.com/t/e6L2JYJapzb   Here is my build planes function:   void CFrustum::BuildPlanes( const mat4& ModelView, const mat4& Proj ) { mat4 clip, modl = ModelView, proj = Proj; float t = 0; clip[ 0] = modl[ 0] * proj[ 0] + modl[ 1] * proj[ 4] + modl[ 2] * proj[ 8] + modl[ 3] * proj[12]; clip[ 1] = modl[ 0] * proj[ 1] + modl[ 1] * proj[ 5] + modl[ 2] * proj[ 9] + modl[ 3] * proj[13]; clip[ 2] = modl[ 0] * proj[ 2] + modl[ 1] * proj[ 6] + modl[ 2] * proj[10] + modl[ 3] * proj[14]; clip[ 3] = modl[ 0] * proj[ 3] + modl[ 1] * proj[ 7] + modl[ 2] * proj[11] + modl[ 3] * proj[15]; clip[ 4] = modl[ 4] * proj[ 0] + modl[ 5] * proj[ 4] + modl[ 6] * proj[ 8] + modl[ 7] * proj[12]; clip[ 5] = modl[ 4] * proj[ 1] + modl[ 5] * proj[ 5] + modl[ 6] * proj[ 9] + modl[ 7] * proj[13]; clip[ 6] = modl[ 4] * proj[ 2] + modl[ 5] * proj[ 6] + modl[ 6] * proj[10] + modl[ 7] * proj[14]; clip[ 7] = modl[ 4] * proj[ 3] + modl[ 5] * proj[ 7] + modl[ 6] * proj[11] + modl[ 7] * proj[15]; clip[ 8] = modl[ 8] * proj[ 0] + modl[ 9] * proj[ 4] + modl[10] * proj[ 8] + modl[11] * proj[12]; clip[ 9] = modl[ 8] * proj[ 1] + modl[ 9] * proj[ 5] + modl[10] * proj[ 9] + modl[11] * proj[13]; clip[10] = modl[ 8] * proj[ 2] + modl[ 9] * proj[ 6] + modl[10] * proj[10] + modl[11] * proj[14]; clip[11] = modl[ 8] * proj[ 3] + modl[ 9] * proj[ 7] + modl[10] * proj[11] + modl[11] * proj[15]; clip[12] = modl[12] * proj[ 0] + modl[13] * proj[ 4] + modl[14] * proj[ 8] + modl[15] * proj[12]; clip[13] = modl[12] * proj[ 1] + modl[13] * proj[ 5] + modl[14] * proj[ 9] + modl[15] * proj[13]; clip[14] = modl[12] * proj[ 2] + modl[13] * proj[ 6] + modl[14] * proj[10] + modl[15] * proj[14]; clip[15] = modl[12] * proj[ 3] + modl[13] * proj[ 7] + modl[14] * proj[11] + modl[15] * proj[15]; /* Extract the numbers for the RIGHT plane */ vPlanes[0][0] = clip[ 3] - clip[ 0]; vPlanes[0][1] = clip[ 7] - clip[ 4]; vPlanes[0][2] = clip[11] - clip[ 8]; vPlanes[0][3] = clip[15] - clip[12]; /* Normalize the result */ t = sqrt( vPlanes[0][0] * vPlanes[0][0] + vPlanes[0][1] * vPlanes[0][1] + vPlanes[0][2] * vPlanes[0][2] ); vPlanes[0][0] /= t; vPlanes[0][1] /= t; vPlanes[0][2] /= t; vPlanes[0][3] /= t; /* Extract the numbers for the LEFT plane */ vPlanes[1][0] = clip[ 3] + clip[ 0]; vPlanes[1][1] = clip[ 7] + clip[ 4]; vPlanes[1][2] = clip[11] + clip[ 8]; vPlanes[1][3] = clip[15] + clip[12]; /* Normalize the result */ t = sqrt( vPlanes[1][0] * vPlanes[1][0] + vPlanes[1][1] * vPlanes[1][1] + vPlanes[1][2] * vPlanes[1][2] ); vPlanes[1][0] /= t; vPlanes[1][1] /= t; vPlanes[1][2] /= t; vPlanes[1][3] /= t; /* Extract the BOTTOM plane */ vPlanes[2][0] = clip[ 3] + clip[ 1]; vPlanes[2][1] = clip[ 7] + clip[ 5]; vPlanes[2][2] = clip[11] + clip[ 9]; vPlanes[2][3] = clip[15] + clip[13]; /* Normalize the result */ t = sqrt( vPlanes[2][0] * vPlanes[2][0] + vPlanes[2][1] * vPlanes[2][1] + vPlanes[2][2] * vPlanes[2][2] ); vPlanes[2][0] /= t; vPlanes[2][1] /= t; vPlanes[2][2] /= t; vPlanes[2][3] /= t; /* Extract the TOP plane */ vPlanes[3][0] = clip[ 3] - clip[ 1]; vPlanes[3][1] = clip[ 7] - clip[ 5]; vPlanes[3][2] = clip[11] - clip[ 9]; vPlanes[3][3] = clip[15] - clip[13]; /* Normalize the result */ t = sqrt( vPlanes[3][0] * vPlanes[3][0] + vPlanes[3][1] * vPlanes[3][1] + vPlanes[3][2] * vPlanes[3][2] ); vPlanes[3][0] /= t; vPlanes[3][1] /= t; vPlanes[3][2] /= t; vPlanes[3][3] /= t; /* Extract the FAR plane */ vPlanes[4][0] = clip[ 3] - clip[ 2]; vPlanes[4][1] = clip[ 7] - clip[ 6]; vPlanes[4][2] = clip[11] - clip[10]; vPlanes[4][3] = clip[15] - clip[14]; /* Normalize the result */ t = sqrt( vPlanes[4][0] * vPlanes[4][0] + vPlanes[4][1] * vPlanes[4][1] + vPlanes[4][2] * vPlanes[4][2] ); vPlanes[4][0] /= t; vPlanes[4][1] /= t; vPlanes[4][2] /= t; vPlanes[4][3] /= t; /* Extract the NEAR plane */ vPlanes[5][0] = clip[ 3] + clip[ 2]; vPlanes[5][1] = clip[ 7] + clip[ 6]; vPlanes[5][2] = clip[11] + clip[10]; vPlanes[5][3] = clip[15] + clip[14]; /* Normalize the result */ t = sqrt( vPlanes[5][0] * vPlanes[5][0] + vPlanes[5][1] * vPlanes[5][1] + vPlanes[5][2] * vPlanes[5][2] ); vPlanes[5][0] /= t; vPlanes[5][1] /= t; vPlanes[5][2] /= t; vPlanes[5][3] /= t; }     The camera is a simple FPS cam, that uses produces lookat matrix for ( Pos, Up, Forward ).   I use this function to check the sphere collision with the frustum   int CFrustum::SphereInFrustum( float x, float y, float z, float radius ) { int p; int c = 0; float d; for( p = 0; p < NUM_PLANES; p++ ) { d = vPlanes[p][0] * x + vPlanes[p][1] * x + vPlanes[p][2] * z + vPlanes[p][3]; if( d <= -radius ) return 0; if( d > radius ) c++; } return (c == 6) ? 2 : 1; }       What I'm doing wrong? it is all seems to be the same as the tutorials.
  13. Gpu skinning and VBO

    BTW, I'm using opengl ES 2, so maybe there are some limitations?
  14. Gpu skinning and VBO

    Hi, I cut some parts of the code that relevant to the issue.   I still did not manage to spot the problematic area.   Please note that the same code and shaders works for VA, but not for VBO. struct sBoneData { vec4 vWeights, vIndices; }; Vertex Arrays glVertexAttribPointer( iPositionAttribLoc, 3, GL_FLOAT, GL_FALSE, sizeof( vec3 ), &pGeo->vVertices[0].x ); glEnableVertexAttribArray( iPositionAttribLoc ); glVertexAttribPointer( iBoneWeightsAttribLoc, 4, GL_FLOAT, GL_FALSE, sizeof( sBoneData ), &pGeo->vBonesData[0].vWeights.x ); glEnableVertexAttribArray( iBoneWeightsAttribLoc ); glVertexAttribPointer( iBoneIndexesAttribLoc, 4, GL_FLOAT, GL_FALSE, sizeof( sBoneData ), &pGeo->vBonesData[0].vIndices.x ); glEnableVertexAttribArray( iBoneIndexesAttribLoc ); unsigned int *ptr = &pGeo->vFaces[ m_pObj->iFacesStartPos ].uIndex[ 0 ]; unsigned int sz = (unsigned int)( m_pObj->iFacesEndPos - m_pObj->iFacesStartPos ) * 3; glDrawElements( GL_TRIANGLES, m_uNumElements, GL_UNSIGNED_INT, 0 ); VBO Generate ... { glGenBuffers(3, m_Buffers); glBindBuffer(GL_ARRAY_BUFFER, m_Buffers[0]); size_t size = sizeof(vec3) * pGeo->vVertices.size(); glBufferData(GL_ARRAY_BUFFER, size ,0 , GL_STATIC_DRAW); size_t startOff = 0, currSize = sizeof(vec3) *pGeo->vVertices.size(); m_VertexOffset = startOff; glBufferSubData(GL_ARRAY_BUFFER, startOff, currSize, &pGeo->vVertices[0].x ); glBindBuffer(GL_ARRAY_BUFFER, m_Buffers[1]); // Indices and weights size_t size2 = sizeof(vec4) *pGeo->vBonesData.size() * 2; glBufferData(GL_ARRAY_BUFFER, size2 ,0 , GL_STATIC_DRAW); startOff = 0; currSize = sizeof(vec4) *pGeo->vBonesData.size(); m_BonesWeightsOffset = startOff; glBufferSubData(GL_ARRAY_BUFFER, startOff, currSize, &pGeo->vBonesData[0].vWeights.x ); startOff += currSize; currSize = sizeof(vec4) *pGeo->vBonesData.size(); m_BonesIndicesOffset = startOff; glBufferSubData(GL_ARRAY_BUFFER, startOff,currSize, &pGeo->vBonesData[0].vIndices.x ); glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, m_Buffers[2]); unsigned int *ptr = &pGeo->vFaces[ m_pObj->iFacesStartPos ].uIndex[ 0 ]; unsigned int sz = (unsigned int)( m_pObj->iFacesEndPos - m_pObj->iFacesStartPos ) * 3; glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(GL_UNSIGNED_INT) * sz,ptr, GL_STATIC_DRAW); glBindBuffer(GL_ARRAY_BUFFER, 0); glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0); Bind ... { glBindBuffer(GL_ARRAY_BUFFER, m_Buffers[0]); if ( handlers[0] != -1 && m_VertexOffset != -1) { //glBindBuffer(GL_ARRAY_BUFFER, m_Buffers[0]); glEnableVertexAttribArray( handlers[0] ); glVertexAttribPointer( handlers[0] ,3,GL_FLOAT, GL_FALSE, sizeof( vec3 ), reinterpret_cast<void*>( m_VertexOffset )); } { glBindBuffer(GL_ARRAY_BUFFER, m_Buffers[1]); if ( handlers[5] != -1 && m_BonesWeightsOffset != -1 ) { glVertexAttribPointer( handlers[1], 4, GL_FLOAT, GL_FALSE, sizeof( sBoneData ), reinterpret_cast<void*>(m_BonesWeightsOffset) ); glEnableVertexAttribArray( handlers[1] ); } if ( handlers[6] != -1 && m_BonesIndicesOffset != -1 ) { glVertexAttribPointer( handlers[2], 4, GL_FLOAT, GL_FALSE, sizeof( sBoneData ), reinterpret_cast<void*>(m_BonesIndicesOffset) ); glEnableVertexAttribArray( handlers[2] ); } glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, m_Buffers[2]); } Draw ... { glDrawElements( GL_TRIANGLES, m_uNumElements, GL_UNSIGNED_INT, 0 ); } UnBind ... for ( int i = 0 ; i < size ; ++i ) if ( handlers[i] != -1 ) glDisableVertexAttribArray( handlers[i] ); glBindBuffer(GL_ARRAY_BUFFER, 0); glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0); }    
  15. Hi,   I have used for some time a gpu skeleton system that preforms well with vertex arrays.   I decided to try use VBOs, with the same setup as I have now,and just replace the vertex arrays with VBOs. ( vertices, weights, indices ).   For some reason the I get bad deformations.    However, if I keep the vertices data on vbo, and the weights/indices at vertex arrays, the renders looks fine.   I'm asking for some direction to explore, assuming that the weights and the indices are stored properly on the VBOs,   is the data on VA stored differently then VBO? I'm just trying to figure what can cause the problem.   tnx