Jump to content
  • Advertisement


  • Content Count

  • Joined

  • Last visited

Community Reputation

598 Good

About Batzer

  • Rank

Personal Information

  • Role
  • Interests

Recent Profile Visitors

3955 profile views
  1. You can also look at http://vulkan.gpuinfo.org/ for a nice overview of a ton of different devices and their properties.
  2. Batzer

    CPU Readback Synchronization

    I seem to have gotten it to work now. I'm using N events that belong to N staging buffers. When staging buffer A is filled on the GPU then event A is signaled on the GPU as well. On the CPU I check if event B has been signaled already. If not then I either wait or just coninue without reading back the data, depending on if the buffer I'm trying to readback this frame will be overwritten the next frame by the GPU. This I have at max N frames delay when reading the data from the GPU, but can run everything at full speed without waiting too often. I'm pretty sure by now that the same behavior can be implemented by using fences, but it would be less fine-grained since events can be signaled at any time in a command buffer, and thus directly after the data is ready to be read. Here is some pseudo-code: renderSomeStuff(); commandBuffer.copyImageToBuffer(framebuffer, stagingBuffers[writeIndex]); commandBuffer.signalEvent(events[writeIndex]); writeIndex = (writeIndex + 1) % NumBuffers; if (writeIndex == readIndex) { waitForEvent(events[readIndex]); // CPU side wait } if (events[readIndex].status == SIGNALED) { readback(stagingBuffers[readIndex]); events[readIndex].reset(); // CPU side reset readIndex = (readIndex + 1) % NumBuffers; } I'd still be grateful if somebody has a better solution!
  3. Batzer


    As @Steve_Segreto said, they way sampling works is mostly the same whether mip-mapping is used or not. "Mostly", because with mip-maps we get an additional axis for interpolation. When computing the optimal mip-level for a given fragment, the value can be some floating-point number. Thus, we want to interpolate between the two mip-levels closest to that optimal level. E.g. your optimal level is 4.6 then use levels 4 and 5. And just to make it clear why exactly we use mip-mapping: Imagine a surface patch very far away from your camera so that it covers exactly one pixel. Now, this surface might have a texture that is 512x512 pixels big applied to it. Thus, all of that texture information must fit into that one pixel, which is obviously not possible with nearest-neighbor or bilinear filtering. This is why we get ugly shimmering when moving the camera, because with e.g. bilinear filtering differnt 2x2 neighborhoods are choosen frame by frame, resulting in differnt colors. You would have to average all 512x512 pixels for the correct result. And that is exactly what we do with mip-mapping, we prefilter some sizes of that texture so that we can just use e.g. bilinear filtering since we can now choose the correct size of the texture. In this case the GPU would choose the smallest mip-level which is 1x1 pixel and is basically the whole image averaged into one pixel.
  4. Hello everyone, I'm currently trying to get CPU readback to work in Vulkan and I'm having issues getting the snychronization to work properly. I want to use the results of a subpass on the CPU. so what I do right now is copy the render target image to a staging buffer and map that buffer and simply memcpy the contents. My idea was to use a VkEvent to signal when the copy to the staging buffer has finished on the GPU so the CPU knows when to start copying valid data. However, I'm not sure if this the correct way to do it or if a VkFence is the correct choice here and if I need memory barriers or not. In addition, I of course want multiple staging buffers in flight at the same time so the CPU doesn't keep the GPU waiting and I'm not sure how to know which buffers are ready to be written to again. I'm very new to low-level GPU programming and I keep getting my GPU to crash the system, so I would be very grateful if somebody could clear things up for me :) Thanks!
  5. You have to remove the divide by PI since it cancels out. In Monte Carlo Path Tracing you have to weight each contribution by the pdf of the sample taken. And since you only consider perfectly diffuse surfaces your BRDF is c/pi where c is your surface color. And due to your cosine weighted direction sampling your pdf is cos(theta)/pi. Which nicely cancels out the cos of the rendering equation and the PI of your BRDF, leaving you only with the incoming radiance. Look here: http://www.rorydriscoll.com/2009/01/07/better-sampling/
  6. Wow thank you so much for your very detailed answer! You really cleared things up, I've never noticed that D3D12's barrier model actually has different semantics than Vulkan's model. I just thought that D3D12 was less expressive in that regard. Couldn't have hoped for a better explanation!
  7. Hey guys, I'd like to ask the more experienced people here what you think (or maybe you know for sure) the problem is with the performance of the implementations of D3D12 in most AAA-titles. The D3D12 render path is almost consistently worse than the D3D11 render path. I'd assume it's because they haven't had time to rewrite their engines for proper use of the newer APIs? And I know it's not good practice to ask more than one big question, but how do Vulkan and D3D12 compare in practical situations? They seem to be very similar, but do have some key differences. Notably, render passes and finer pipeline barrier control in Vulkan. So I'd assume Vulkan offers better control and thus performance?
  8. Batzer

    .obj file text parsing

    That certainly works. However, it's not the most efficient way to solve this since you essentialy parse the string twice. You could use a combination of peek(), ignore() and operator>> to do this better. You also have to consider the different combinations of indices. There are 4 possible ways a face vertex can be given: f v v v ... f v/vt v/vt v/vt ... f v//vn v//vn v//vn ... f v/vt/vn v/vt/vn v/vt/vn ... You can assume that a vertex position is always given and then check with peek() if the next character is a slash. This way you can distinguish between the different cases. If it's a slash just use ignore() to skip it. Also, if you want to support even more possibilities then you have to consider negative indices. For more in depth information about the format you can read the relevant parts on http://paulbourke.net/dataformats/obj/ and http://paulbourke.net/dataformats/mtl/
  9. I haven't used D3D12 yet, but i guess it will be the same as in D3D11. Meaning you call IUnknown::QueryInterface on your device, command list and command queue.
  10.   Actually it could be a huge difference. Calling glBufferSubData for each tree every frame will make the GPU wait for the CPU to upload the data. This can and will kill your performance. The only way to actually make UBOs perform better than glUniformX is to make a huge UBO for all your trees and upload all the matrices at once before rendering and then for each tree use glBindBufferRange to bind the correct transforms. This will be nice and fast. At least that is my experience with UBOs. To avoid synchronization between GPU and CPU you should use buffer orphaning or manual synchronization. Mote info here: https://www.opengl.org/wiki/Buffer_Object_Streaming. And here: http://www.gamedev.net/topic/655969-speed-gluniform-vs-uniform-buffer-objects/
  11. Here are some more OpenGL tutorials: http://www.learnopengl.com/ http://ogldev.atspace.co.uk/
  12. So I actually found the problem. It appears that when a uniform from a uniform block is used as an argument for the built-in function pow the linker isn't happy and just never links the shader. I found that when the uniform is assigned to a local variable the linker is happy: Change float specular = pow(max(0.0, NdotH), MaterialSpecular.w); to float shininess = MaterialSpecular.w; float specular = pow(max(0.0, NdotH), shininess); This MUST be a driver bug. Another workaround is to use instance names for the uniform blocks like in my previous post. But this is just annoying. Qualcomm fix your shit...
  13. Hey guys,   I have a very weird problem. When I try to use more than one uniform buffer in a shader glLinkProgram will just go into oblivion and never return. If I define the blocks without an instance name the lock up occurs: layout (std140) uniform LightBlock { vec3 LightColor; vec3 DirToLight; vec3 AmbientColor; }; layout (std140) uniform MaterialBlock { vec4 Diffuse; vec4 Specular; }; But when I give the blocks instance names, the program runs just fine: layout (std140) uniform LightBlock { vec3 LightColor; vec3 DirToLight; vec3 AmbientColor; } Light; layout (std140) uniform MaterialBlock { vec4 Diffuse; vec4 Specular; } Material; This is not the behaivior described in the specs. Tbh glLinkProgram should never lock up ... I tested the Code on a Nexus 4 (Adreno 320) and Nexus 5 (Adreno 330). Am I doing something wrong or is this a driver bug from Qualcomm?
  14. You could store a collision map in memory and just check if the the picked pixel is transparent. Then you can just check from front to back and the first image with an opaque pixel is the one you want. The collision map can just be an array of bools or uint8_t (0 for transparent, 1 for opaque). Don't know if this is the fastest way, but it's definitely faster than reading from the GPU.
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!