Jump to content
  • Advertisement


  • Content Count

  • Joined

  • Last visited

Community Reputation

599 Good

About Batzer

  • Rank

Personal Information

  • Role
  • Interests

Recent Profile Visitors

4614 profile views
  1. Batzer

    How to do a tiled map

    As you said, it all depends on the map size. If you are planning on going huge with the map then only having the necessary parts in memory is preferable. However, I actually don't think that memory is ever a real problem here: Lets say you map is 1000x1000 tiles large. Then you need the same amount of quads to render each tile. Now. assuming you only need your 2D world and texture coordinates, each quad is 96 bytes (6 vertices). Therefore, for your complete map you only need 91 MB of VRAM for the vertices. For the actual textures, one 8k texture gives you about 6561 unique 100x100 tiles. Which costs you 256 MB of VRAM (RGBA8 format). All in all, memory is not an issue imo. For actually rendering the world you have several options, depending on what kind of hardware you want to support. The nicest option would be to perform culling on the GPU and do rendering indirectly. This way you only render the triangles you see and without ever having to upload anything to the GPU. If you need to support older hardware, you could still leave all vertices on the GPU and just upload the indices per frame (16 bytes per quad, 4 vertices), which should only be around 207 quads for a 1920x1080 screen size. You would then perform culling on the CPU and determine which quads to render. The option of just always rendering the whole map can work well, but this ofc heavily depends on the map size. So one of the former approaches is to be prefered imo.
  2. Well the WARP adapter in D3D11 is pure software as far as I know. Of course, that falls into the area of GPU emulations I guess. I don't think that anybody has put in any serious effort into software rasterizers. Or are we talking about any kind of renderer here? Most (all?) software raytracers are geard towards the best possible image quality and pay multi-hour rendering times per frame as the price. I'm thinking of Arnold, RenderMan, and so on. Other than that, the preview renderers are pretty fast most of the time. However, rasterizers haven't been done in software for a long time for a good reason. Even the cheapest Intel chip has an IGPU.
  3. You can also look at http://vulkan.gpuinfo.org/ for a nice overview of a ton of different devices and their properties.
  4. Batzer

    CPU Readback Synchronization

    I seem to have gotten it to work now. I'm using N events that belong to N staging buffers. When staging buffer A is filled on the GPU then event A is signaled on the GPU as well. On the CPU I check if event B has been signaled already. If not then I either wait or just coninue without reading back the data, depending on if the buffer I'm trying to readback this frame will be overwritten the next frame by the GPU. This I have at max N frames delay when reading the data from the GPU, but can run everything at full speed without waiting too often. I'm pretty sure by now that the same behavior can be implemented by using fences, but it would be less fine-grained since events can be signaled at any time in a command buffer, and thus directly after the data is ready to be read. Here is some pseudo-code: renderSomeStuff(); commandBuffer.copyImageToBuffer(framebuffer, stagingBuffers[writeIndex]); commandBuffer.signalEvent(events[writeIndex]); writeIndex = (writeIndex + 1) % NumBuffers; if (writeIndex == readIndex) { waitForEvent(events[readIndex]); // CPU side wait } if (events[readIndex].status == SIGNALED) { readback(stagingBuffers[readIndex]); events[readIndex].reset(); // CPU side reset readIndex = (readIndex + 1) % NumBuffers; } I'd still be grateful if somebody has a better solution!
  5. Batzer


    As @Steve_Segreto said, they way sampling works is mostly the same whether mip-mapping is used or not. "Mostly", because with mip-maps we get an additional axis for interpolation. When computing the optimal mip-level for a given fragment, the value can be some floating-point number. Thus, we want to interpolate between the two mip-levels closest to that optimal level. E.g. your optimal level is 4.6 then use levels 4 and 5. And just to make it clear why exactly we use mip-mapping: Imagine a surface patch very far away from your camera so that it covers exactly one pixel. Now, this surface might have a texture that is 512x512 pixels big applied to it. Thus, all of that texture information must fit into that one pixel, which is obviously not possible with nearest-neighbor or bilinear filtering. This is why we get ugly shimmering when moving the camera, because with e.g. bilinear filtering differnt 2x2 neighborhoods are choosen frame by frame, resulting in differnt colors. You would have to average all 512x512 pixels for the correct result. And that is exactly what we do with mip-mapping, we prefilter some sizes of that texture so that we can just use e.g. bilinear filtering since we can now choose the correct size of the texture. In this case the GPU would choose the smallest mip-level which is 1x1 pixel and is basically the whole image averaged into one pixel.
  6. Hello everyone, I'm currently trying to get CPU readback to work in Vulkan and I'm having issues getting the snychronization to work properly. I want to use the results of a subpass on the CPU. so what I do right now is copy the render target image to a staging buffer and map that buffer and simply memcpy the contents. My idea was to use a VkEvent to signal when the copy to the staging buffer has finished on the GPU so the CPU knows when to start copying valid data. However, I'm not sure if this the correct way to do it or if a VkFence is the correct choice here and if I need memory barriers or not. In addition, I of course want multiple staging buffers in flight at the same time so the CPU doesn't keep the GPU waiting and I'm not sure how to know which buffers are ready to be written to again. I'm very new to low-level GPU programming and I keep getting my GPU to crash the system, so I would be very grateful if somebody could clear things up for me :) Thanks!
  7. You have to remove the divide by PI since it cancels out. In Monte Carlo Path Tracing you have to weight each contribution by the pdf of the sample taken. And since you only consider perfectly diffuse surfaces your BRDF is c/pi where c is your surface color. And due to your cosine weighted direction sampling your pdf is cos(theta)/pi. Which nicely cancels out the cos of the rendering equation and the PI of your BRDF, leaving you only with the incoming radiance. Look here: http://www.rorydriscoll.com/2009/01/07/better-sampling/
  8. Wow thank you so much for your very detailed answer! You really cleared things up, I've never noticed that D3D12's barrier model actually has different semantics than Vulkan's model. I just thought that D3D12 was less expressive in that regard. Couldn't have hoped for a better explanation!
  9. Hey guys, I'd like to ask the more experienced people here what you think (or maybe you know for sure) the problem is with the performance of the implementations of D3D12 in most AAA-titles. The D3D12 render path is almost consistently worse than the D3D11 render path. I'd assume it's because they haven't had time to rewrite their engines for proper use of the newer APIs? And I know it's not good practice to ask more than one big question, but how do Vulkan and D3D12 compare in practical situations? They seem to be very similar, but do have some key differences. Notably, render passes and finer pipeline barrier control in Vulkan. So I'd assume Vulkan offers better control and thus performance?
  10. Batzer

    .obj file text parsing

    That certainly works. However, it's not the most efficient way to solve this since you essentialy parse the string twice. You could use a combination of peek(), ignore() and operator>> to do this better. You also have to consider the different combinations of indices. There are 4 possible ways a face vertex can be given: f v v v ... f v/vt v/vt v/vt ... f v//vn v//vn v//vn ... f v/vt/vn v/vt/vn v/vt/vn ... You can assume that a vertex position is always given and then check with peek() if the next character is a slash. This way you can distinguish between the different cases. If it's a slash just use ignore() to skip it. Also, if you want to support even more possibilities then you have to consider negative indices. For more in depth information about the format you can read the relevant parts on http://paulbourke.net/dataformats/obj/ and http://paulbourke.net/dataformats/mtl/
  11. I haven't used D3D12 yet, but i guess it will be the same as in D3D11. Meaning you call IUnknown::QueryInterface on your device, command list and command queue.
  12.   Actually it could be a huge difference. Calling glBufferSubData for each tree every frame will make the GPU wait for the CPU to upload the data. This can and will kill your performance. The only way to actually make UBOs perform better than glUniformX is to make a huge UBO for all your trees and upload all the matrices at once before rendering and then for each tree use glBindBufferRange to bind the correct transforms. This will be nice and fast. At least that is my experience with UBOs. To avoid synchronization between GPU and CPU you should use buffer orphaning or manual synchronization. Mote info here: https://www.opengl.org/wiki/Buffer_Object_Streaming. And here: http://www.gamedev.net/topic/655969-speed-gluniform-vs-uniform-buffer-objects/
  13. Here are some more OpenGL tutorials: http://www.learnopengl.com/ http://ogldev.atspace.co.uk/
  14. So I actually found the problem. It appears that when a uniform from a uniform block is used as an argument for the built-in function pow the linker isn't happy and just never links the shader. I found that when the uniform is assigned to a local variable the linker is happy: Change float specular = pow(max(0.0, NdotH), MaterialSpecular.w); to float shininess = MaterialSpecular.w; float specular = pow(max(0.0, NdotH), shininess); This MUST be a driver bug. Another workaround is to use instance names for the uniform blocks like in my previous post. But this is just annoying. Qualcomm fix your shit...
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!