• Content count

  • Joined

  • Last visited

  • Days Won


Hodgman last won the day on August 20

Hodgman had the most liked content!

Community Reputation

51234 Excellent

1 Follower

About Hodgman

  • Rank
    Moderator - APIs & Tools

Personal Information


  • Twitter
  • Github

Recent Profile Visitors

68132 profile views
  1. World of Shinobi

    If you were reproducing their assets in your game that would be copyright infringment, but I think swiftcoder is talking about Sega's trademark on the word "Shinobi". They own the right to distribute video games with the word "Shinobi" in the title. If you use that name, they will eventually send you legal threats as you will be infringing on their trademark. https://trademarks.justia.com/763/95/shinobi-76395816.html Yes, being able to buy the rights to a common word is stupid, but that's the legal reality. If you wanted to contest it by saying "hey, that's a common word", you would've had to file an objection back in 2002 when they first registered it.
  2. If Z-fighting is an issue, then yeah I'd definitely recommend doing this. Make sure you're using a 32_FLOAT depth format. Swap the near/far params of your projection matrix creation code (e.g. instead of Projection(fov, aspect, near, far), use Projection(fov, aspect, far, near)). Swap your depth comparison function (e.g. replace LESS_EQUAL with GREATER_EQUAL). If you have any shaders that read the depth buffer (e.g. deferred lighting reconstructing positions from depth), then fix the bugs that this has introduced to that code). [edit] Clear your depth buffer to 0.0f instead of 1.0f. [/edit] The link I posted earlier explains why this is magic. But quickly -- z-buffers store z/w, which is a hyperbolic curve that focuses most precision on values that are close to the near plane (something like 50% of your depth buffer values cover the range of (near, 2*near]!!), and floating point formats do a similar thing -- they're a logarithmic format that focuses most precision on values that are close to zero. If you simply use floating point format to store z/w, you make the problem twice as bad -- you've got two different encodings that both focus on making sure that values next to the near plane are perfect, and do a bad job of values next to the far plane... So if you invert one of the encodings (by mapping the far plane to zero), then you've now go two encodings that are fighting against each other -- the z/w hyperbolic curve is fighting to focus precision towards the near plane, and the floating point logarithmic curve is fighting to focus precision towards 0.0f (which we've mapped to the far plane). The result is that you end up with an almost linear distribution of values between near and far, and great precision at every distance.
  3. The cost of a texture sample depends whether you hit the cache or not, which depends on whether your sampling is coherent or not (e g. Do neighbouring pixels sample neighbouring texels). If your SSAO changes it's sampling radius based on the distance to the surface, then this is a predictable result. At long range, your pixels might be sampling a small 3x3 area of texels, which is quite predicable, but at near range perhaps you start sampling a 1000x1000 area of pixels (111k times larger), which is very incoherent and the cache suddenly can't help you any more. These kinds of variable radius effects either need a way to reduce the size of the data set that they're sampling on, such as the mipmaps mentioned above (a hierarchical structure) or simply clamping your filter radius with "min".
  4. Yeah AFAIK AMD doesn't even support 24bit depth buffers in hardware. They might have some hack where they use the 24bit mantissa of a 32bit float buffer, but in any case it's basically emulation of an old legacy feature. The default depth format these days should be 32bit floating point. On a side note, you should combine your floating point depth buffer with a projection matrix that maps the far plane to 0 and the near plane to 1 (e.g. swap the near/far params that are being fed into your projection-matrix construction function) and use GEqual depth comparison instead of LEqual. 32bit floating point depth buffers, when combined with these reversed projection matrices, produce amazing precision. See here: https://developer.nvidia.com/content/depth-precision-visualized I'm not sure about Intel/NVidia for performance / memory use trade-offs... It may be that Intel uses different packing to AMD, etc. However, there's a massive quality difference between doing the "reversed float" format above and using the traditional 24-bit format. The old way results in huge amounts of z-fighting, forcing you to always be tweaking your near/far values to hide it, while the new way practically solves z-fighting.
  5. MMO development (Help)

    479/50000 = ~0.01. So you'd need to achive 1c average monthly revenue per user. Or in other words, if 1% of your player base spends $1 per month in your game, that will pay the bill. That's totally achievable, assuming you actually have 50k players. What's more important is what the bills are now, when you have 5 players.
  6. Which GL function calls contain these massive stalls?
  7. I'm pretty new to GPU View too, so I'm not really sure what to look for I looked through the capture that you PM'ed me, hoping to maybe find that some NVidia thread was busy while your game was stalled, or the GPU was busy with some kind of DMA command or something... but all I can understand from this is that the GPU is idling a lot, and your game's main thread is extremely busy This is what a well-performing capture should look like though -- notice the HW queue and the game's device context are constantly full of queued up work. Have you tried adding manual timing code to your game, to try and locate exactly which functions are blocking the CPU? You say that if you disable some code, the performance issue is gone... but try timing different bits of code to see if you can find where the time is going.
  8. Have to rotate 180 and flip horizontal to work

    How are you constructing your view/projection matrices? Looks like you're using the opposite handedness than blender.
  9. Try profiling it on GPU View and seeing if there's anything in that data to explain what's going on.
  10. The rootsig creates a mapping from root-descriptors/descriptor-table layouts, and the shader's bind slots (register assignment). You can have randomly scattered register assignments in your shader, but then have predicable/contiguous descriptor table layouts. I do shader register assignment via code generation from a simple data definition. So that's automatic from the shader author's point of view, but slots are being manually specified from D3D's point of view. When defining the input resources for a shader, a shader author must group them into "resource lists", which become the descriptor table groupings later on. As part of the shader compiler, I loop through all my possible shader programs (where a program is a set of PS/VS/*S), get its set of input resources, and keep a list of the unique sets. Each of these is used to generate a rootsig and instructions for building the tables used by that rootsig.
  11. Yeah pretty much, I: Have a bunch of different rendering systems (models, ui, particles, etc), which know how to draw different kinds of things. These will ideally pre-create a bunch of draw-items. Each frame, I query those systems to fill some containers up with draw-items. There can be more than one container/queue -- e.g. one for opaque objects, one for transparent, one for each shadow-map, etc... The actual set of containers in use is defined by a "render pipeline" object. Those queues/containers get sorted (each might be sorted differently). Each individual container is then submitted alongside a "render pass" object, into a device context. They don't all have to be submitted at the time time, but generally all rendering submission does occur in the same big chunk of code.
  12. ;( Try it in a PC with a 144Hz monitor now..
  13. Wow. So edgy. I bet nobody gets you IRL. Good luck with the funding drive.
  14. 2.1. Once you go stateless, you won't go back 2.2. New features get added very rarely to D3D/GL/Vulkan. Adding a new GL extension/etc to the stateless API is a bit more code, but I wouldn't really say it's any more difficult. 2.3. Why does this expose less capabilities than option #1? Anything that you can expose in option #1, you can also put into your stateless command architecture. My thoughts on this are here: http://www.goatientertainment.com/downloads/Designing a Modern GPU Interface.pptx
  15. Decentralized Application Marketplace

    Yesterday I found out about lbry.io and today spheris.io! It seems the blockchain revolution is on the way! What's the similarities / differences between your project and others like lbry? Do you think there will be client applications that speak both protocols? Yeah, this sounds useful for game developers as an additional marketplace to the current mainstream options.