Jump to content
  • Advertisement


  • Content Count

  • Joined

  • Last visited

Community Reputation

381 Neutral

About Plerion

  • Rank
  1. @Hodgman I tried to find two positions where they have about the same frame rate and frame time:   The so called "slow" facing:     The "fast" facing: The frame times are very interesting i think. The "pre frame time" is a bit a misleading title, its the actual time from glClear until SwapBuffers. The "Post Frame Time" is the time from the beginning of SwapBuffers to the end of SwapBuffers. SwapBuffers is also calling that NtGdiDdDDIEscape which gets most of the samples in the profiler. This is also where the difference happens when im using roughly the same part of the landscape for both facings. The slower uses 2.6ms in SwapBuffers, the faster one uses 1.6ms in SwapBuffers. The "pre Frame Time" (the drawing itself) is proportional to the number of indices/draw calls.    I assume the above is because the draw commands are actually executed when SwapBuffers flushes the queue? But what is this NtGdiDdDDIEscape function thats using up most of the time in SwapBuffers according to the profiler?
  2. Yes, im aware of the fps vs frame time difference, i also calculated that 1ms difference which seemed strange to me.   The "slow" version is facing in the direction where the specular reflection on the blocks is at its maximum, the "fast" version is 180° in the other direction with absolutely no specular lighting. The shadows are AO, everything is rendered using the same code on CPU and GPU. Since AO and also the exact lighting isnt fully implemented yet it looks a bit odd, like the specular lighting trumps the AO which makes it look a lot different. The important part is, that there is currently only one path to rendering the blocks and it uses all the same states.   I suspected that it might be that in one direction my frustum/box intersection code might get more cases where it can early out, but profiling both situations over a longer time period yielded no major difference (fast version: 2.6% of the samples in intersection, slow version 3.1% of the samples in intersection, the difference might just be because one version was sampled over a longer period).   Here is what the profiler says for both version considering the functions doing the most work: fast version: something in nvoglv32.dll - the actual rendering i guess -> 28.67% Exclusive Samples NtGdiDdDDIEscape - no idea what that is, nor ever seen in any other project -> 26.09% NtGdiDdDDIGetDeviceState - same -> 9.59% Math::Frustum::intersects -> 4.34% RtlQueryperformanceCounter -> 3.16%   slow version -> same functions, same order, only %-values: 28.59% 26.22% 9.74% 4.31% 3.40%   Some pretty much the same for CPU sampling.
  3. Oh, i thought it was, but actually it wasn't, forgot to turn it back on after the last experiments, which makes it even strange for me since it means every triangle went through the entire stage. I enabled it now but the difference remains, just on a higher level  
  4. Hi all   As my title already suggests I am having a little bit of a weird problem. First of all let me show you a picture of what im talking about.   This is how the scene normally looks right now:   Currently there is virtually no optimization, just frustum culling and "not-rendering" of invisible blocks. As you can see it renders with over 250 FPS. However if I turn 90° to the left, watch what happens: There are less triangles rendered (indices/3) and also fewer draw calls in the second image, but the frame time nearly doubled. I am baffled why this happens. The geometry is exactly the same, the exact same rendering setup, same shaders, there is no branching in the shaders that might only happen in the second picture. Its also not depending on the light direction, when i invert the light direction so that the first image in the situation above gets hit by specular light and the second is that dull gray its the same effect. The maps are also generated randomly at each start, so this also can be ruled out.   I really wonder what could cause that issue, does anyone of you have an idea what i should start checking? I already did a lot of CPU profiling where i spend a lot of time in either of the two situations and compare the render loop for changes, but so far i havent had any luck. Could it be something on the GPU? Even if everything is pretty much identical?   Thanks in advance, Plerion   
  5. Hello guys   Im asking this question for a friend of mine. He has an old program thats still using the FFP for rendering in OpenGL. Right now it still all works but there is a new feature hed like to implement. Every vertex has a color value. But its modulated in a different way. In GLSL in the fragment shader it would be like:   gl_FragColor = input_color.rgb * 2 * texture_color; So essentially a color value of 0x7F7F7F7F would just return the texture color whereas 0xFFFFFFFF would double all channels of the texture.   The color values are bound to a buffer and sent to the FFP using glColorPointer. In DirectX i remember there was some kind of texture sampler state that allowed to specify MODULATE_2X as color operation. Is there something similar in OpenGL?   Greetings Plerion
  6. Hello again   Because i didnt know what else to try i just did a transpose on the instance matrix and what do you know, it works. In retrospect it makes sense, float4x4 is column-major, the float4x4 contstructor takes row-major. Thus i had to pass the matrix entries untransposed compared to view/proj and the rest.   Greetings Plerion
  7. Hello all   Im using instancing to draw the opaque parts of of heavily repeated objects. I am running into some problems reading the instance data however.   My input structure for the vertex shader is like that: struct VertexInput { float3 position : POSITION0; float4 boneWeights : BLENDWEIGHT0; int4 bones : BLENDINDEX0; float3 normal : NORMAL0; float2 texCoord : TEXCOORD0; float2 texCoord2 : TEXCOORD1; float4 mat0 : TEXCOORD2; float4 mat1 : TEXCOORD3; float4 mat2 : TEXCOORD4; float4 mat3 : TEXCOORD5; }; In order to get the position (before view and projection) i do the following: VertexOutput main(VertexInput input) { float4x4 matInstance = float4x4(input.mat0, input.mat1, input.mat2, input.mat3); // bone & animation stuff position = mul(position, matInstance); // ... } the animation stuff and the per vertex input data is correct, I modified the last line to be: position = position.xyz + eyePosition + float3(100, 0, 0); and the elements appear correctly in front of my camera.   I have checked with the graphics debugger, in my opinion the input data looks correct (im not showing the per vertex stuff, since thats working): Instance buffer (i checked, its bound):     Input Layout:   Im using the DrawIndexedInstanced function.   The result is completely wrong however:    Where should i begin to look at? What could be the reason of this strange behavior?   Thanks in advance, Plerion
  8. Plerion

    Passing cube normals to shader

    Hello L. Spiro   Thanks for your answer, i already thought that it will be like that. For your second quote: I of course meant "without using 36 (24) vertices". /EDIT: No, actually i didnt, i wasnt remembering my posting correctly/   Anyway, I was able to store 3d position, normal, texcoord and color in 8 bytes per vertex, so ill just stick with the inreased vertex count (the devices its going to be running on have limited gpu memory available).     Greetings Plerion
  9. Hello all   Im using cube rendering. Basically per cube im using 8 vertices and 36 indices as one might expect. The problem im currently facing is passing normals accordingly to the shader. Putting them in the vertex buffer seems impractical since each vertex has 3 independent normals. My first guess was just sending an vec3 array as uniform to the shader and index it gl_VertexID but since that is not an option in WebGL im kinda out of ideas.   Is the best way to do so by using 36 vertices or is there a simpler way to accomplish this? Essentially I could use the average normal on each vertex and then the average of the 4 vertices of each face would be correct again. But obviously I can only access the one normal of the vertex. In the fragment shader its the interpolated value but not the average of the 4 vertices of the quad.   Thanks for any tips Cromon
  10. Hello all   Its me again! It was a combination of two problems: 1. matrix = matrix * animator->getMatrix(time, mBone.parentBone);   This has to be parent * matrix and not matrix * parent since i first want to transform by my local matrix and then apply the parents transformation!   2. The rotation values were wrong. I had to conjugate the quaternion with the rotation.   Now it all works fine.   Greetings Plerion
  11. Hello all   I am using skinned meshes with hierarchical bones in my application. Strangely i get rather mixed results for different models. The problem right now is that i am not sure if i am reading the values wrong or doing the math wrong. Let me first show you 2 different videos of different models:   N°1: https://www.dropbox.com/s/n6r7wzyfxdw20rl/2014-10-11_17-49-35.mp4?dl=0   As you can see it doesnt look that bad, yet there are strange bumps in the animation and the character seems to be moving up and down as well.   N°2: https://www.dropbox.com/s/qgn785i5x7y1jhn/2014-10-11_17-50-59.mp4?dl=0   For this nice fella however the animations seem to completely wrong...   The main code i am using to calculate my matrices looks like that: void M2AnimationBone::updateMatrix(uint32 time, uint32 animation, Math::Matrix& matrix, M2Animator* animator) { auto position = mTranslation.getValueForTime(animation, time, animator->getAnimationLength()); auto scaling = mScaling.getValueForTime(animation, time, animator->getAnimationLength()); auto rotQuat = mRotation.getValueForTime(animation, time, animator->getAnimationLength()); matrix = mPivot * Math::Matrix::translation(position) * Math::Matrix::rotationQuaternion(rotQuat) * Math::Matrix::scale(scaling) * mInvPivot; if (mBone.parentBone >= 0) { matrix = matrix * animator->getMatrix(time, mBone.parentBone); } } With getMatrix like this: const Math::Matrix& M2Animator::getMatrix(uint32 time, int16 matrix) { assert(matrix >= 0 && (uint32) matrix < mBones.size()); if (mCalculated[matrix]) { return mMatrices[matrix]; } auto& mat = mMatrices[matrix]; mBones[matrix]->updateMatrix(time, mAnimationId, mat, this); mCalculated[matrix] = true; return mat; } Ive been looking through several tutorials and explanations online and found - in my opinion - several different versions of it. Mostly it seems that the whole pivot stuff is a bit different everywhere. Am i doing it the right way?   Thanks for any help Plerion
  12. Hello MJP   Indeed i forgot about that, but in this case it was not the source of the problem. Even with the correct row pitch it still showed up wrong. Turns out that squish converted the first layer correctly to S3TC but then it did wrong conversions. Switched to libtxc_dxtn and got it all working now!   Greetings Plerion
  13. Hello all   For my project i have developed my own texture format and im currently writing a program that converts png images into that format including their precalculated mip map layers. I thought id use d3d11 to calculate the mipmaps since ive been using them mipmaps created by the engine itself so far for the textures and just read the actual data from the texture. In order to do so ive first created a texture with the appropriate flags and bindings to generate mipmaps and then copied it to a texture which can be read from the CPU. I then use squish to convert these layers into (right now statically) dxt1.   In code this means: std::vector<uint8> img = createImage(file, w, h); /* snippet removed: getting layer count -> it works */ D3D11_TEXTURE2D_DESC texDesc = { 0 }; texDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE | D3D11_BIND_RENDER_TARGET; texDesc.CPUAccessFlags = 0; texDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM; texDesc.MiscFlags = D3D11_RESOURCE_MISC_GENERATE_MIPS; /* removed obvious like array size, usage, and so on, it all works */ ID3D11Texture2D* mipTexture = nullptr; massert(SUCCEEDED(gImageDevice->CreateTexture2D(&texDesc, nullptr, &mipTexture))); gImageCtx->UpdateSubresource(mipTexture, 0, nullptr, img.data(), w * 4, 0); ID3D11ShaderResourceView* srv = nullptr; /* snippet removed, obvious SRV creation, same mip levels, same format */ massert(SUCCEEDED(gImageDevice->CreateShaderResourceView(mipTexture, &srvd, &srv))); gImageCtx->GenerateMips(srv); texDesc.BindFlags = 0; texDesc.CPUAccessFlags = D3D11_CPU_ACCESS_READ; texDesc.MiscFlags = 0; texDesc.Usage = D3D11_USAGE_STAGING; ID3D11Texture2D* cpuTexture = nullptr; massert(SUCCEEDED(gImageDevice->CreateTexture2D(&texDesc, nullptr, &cpuTexture))); //gImageCtx->CopyResource(cpuTexture, mipTexture); for (uint32 i = 0; i < numLayers; ++i) { gImageCtx->CopySubresourceRegion(cpuTexture, i, 0, 0, 0, mipTexture, i, nullptr); } /* snippet removed, opening the file (binary) and writing the header */ for (uint32 i = 0; i < numLayers; ++i) { D3D11_MAPPED_SUBRESOURCE resource; massert(SUCCEEDED(gImageCtx->Map(cpuTexture, i, D3D11_MAP_READ, 0, &resource))); uint32 cw = std::max<uint32>(w >> i, 1); uint32 ch = std::max<uint32>(h >> i, 1); std::vector<uint8> layerData(cw * ch * 4); memcpy(layerData.data(), resource.pData, layerData.size()); gImageCtx->Unmap(cpuTexture, i); auto compSize = squish::GetStorageRequirements(cw, ch, squish::kDxt1); std::vector<uint8> outData(compSize); squish::CompressImage(img.data(), cw, ch, outData.data(), squish::kDxt1); os.write((const char*) outData.data(), outData.size()); } While this works fine for the first layer i have some problems with subsequent mip levels. For the first layer see: (RGBA vs BGRA aka D3D11 vs Chromium)   Now for example the second layer already looks bad, see here: Layer 1:   Layer 2:   Layer 3:   and so on   As you can see im not happy with how stuff looks after layer 1. This also is visible when im using said texture it looks very bad:   Am i doing something wrong or is that just.... uhm... the way d3d creates mip levels? Are there good alternatives to d3d to create the mipmaps?    Any help or hints are much appreciated. I wish you a nice evening (or whatever time of the day applies to you ;)) Plerion
  14. A little updated: Ive switched back to the one resize per frame approach and did a reportliveobjects at entersizemove and exitsizemove. This is what i get after resizing the window for over a minute (several thousand frames): http://pastebin.com/Ekv4Wg54   The only thing odd are the 4500 references on the pixel shader. But the amount of objects seems reasonable.
  15. Hello L. Spiro   The leak was hidden, if i move back to instantly resizing it crashes again. I dont know what i can release more, i have the following: -> release render target view -> no reference to the Texture2D of the backbuffer is held -> all shader resource views are released -> depth buffer view is released -> depth buffer texture is released -> ClearState is called   Not released are: -> 1 Vertex & 1 Pixel shader -> the Texture2D object in the textures (the view is recreated after resize) -> depth stencil states -> blend states -> rasterizer states -> sampler states -> 1 Vertex buffer -> 1 Index buffer -> 1 Input layout   Greetings Plerion
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!