Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 14 Feb 2007
Offline Last Active Today, 10:27 AM

#5181680 Temporal coherence and render queue sorting

Posted by Hodgman on 19 September 2014 - 10:53 PM

How do you deal with sorting of transparent polygons that belong to the same sub-mesh?

Games usually just ignore this problem altogether!
Occasionally, I've seen people pre-sort their triangles for some small number of directions (e.g. 8) and generate 8 different index buffers for those orderings. When submitting the mesh, you compare the view direciton to those 8, and use the index buffer offset of the closest match.

#5181528 Poor STL threads performance

Posted by Hodgman on 19 September 2014 - 07:25 AM

Creating and destroying threads are costly operations. You ideally want to have a small number of permanent threads (about the same number has you have hardware-threads) and keep them busy over the life of your app.

#5181457 Graphics without drivers?

Posted by Hodgman on 19 September 2014 - 12:04 AM

Run your program through a DOS emulator, then you can use all those old-school techniques wink.png

#5181207 Why does divison by 0 always result in negative NaN?

Posted by Hodgman on 18 September 2014 - 01:29 AM

You really can't trust floating point code to always behave the same, period, even with the same binary.

If you've told your compiler to obey the IEEE spec (in MSVC - "strict" FP mode), then yes, the same binary will always produce the same results... The cost of doing so is that it will be severely inefficient, as the compiler will emit instructions after every single FP operation to flush the result to memory and then reload it back into a register, which eliminates any accumulated processor-specific rounding issues.


It's a complete pain in the arse, but (godly?) devs in the past have managed to create games using the lock-step networking technique (that relies on every client computing the exact same game-states), while using floating point physics/gameplay (instead of the usual fixed-point workaround), supporting cross-platform multi-player including x86 and PPC clients ohmy.png cool.png

#5181191 Screen Space Reflection Problem

Posted by Hodgman on 17 September 2014 - 10:10 PM

BTW, You can see the exact same artifact as in your original picture in Unreal Engine 4 wink.png

#5180990 Deploy a DirectX 9.0 application through Wine on Linux

Posted by Hodgman on 17 September 2014 - 06:59 AM

Not an answer to your question, sorry, but if you decide to do a native port, you might be interested in Valve's free D3D9->OpenGL translation library.

#5180970 HD remakes, what does it involve?

Posted by Hodgman on 17 September 2014 - 05:10 AM

Pretty much.


You'll have a team of artists basically re-doing all the textures, and perhaps re-doing the models.

If you're unlucky, this is an equal amount of art-work to the original game (minus experimentation that didn't make it onto the disc).

If you're lucky, the original artists actually created their work at high resolution to begin with, but then shipped with reduced quality -- e.g. often you'll create 4k textures "just because", and then reduce them to 256px to get them to work on a crappy old console. In this situation, you don't have to recreate that texture, just re-export it!


However, you'll likely also need a small number of specialist programmers. The gameplay code will hopefully work as-is, but the game engine itself will need to be ported to the new platform. This probably requires re-writing the guts of the renderer, file-system, networking, platform-integration (Live/PS Network/Steam/etc), audio systems, etc...


While you're at it, you might take the time to get some gameplay programmers to fix some bugs, or to re-record/re-mix some audio, or to redo some cutscenes, etc...

If you completely redo the engine, you might've changed the way that the game is rendered - if so, you might go back through and re-do all the lighting, etc...


It really depends on the project as there's no standard practice for what's involved in these kinds of re-releases.

#5180967 Intrinsics to improve performance of interpolation / mix functions

Posted by Hodgman on 17 September 2014 - 04:46 AM

Some vector-based GPUs have a horizontal add instruction that does this. Writing x.r+x.g+x.b+x.a (or dot(x,(float4)1)) will cause FXC to emit a bunch of add instructions (or a dot), but the driver will optimize them into a single horizontal add.

Modern GPUs are scalar, not vector though, so a float4 is the same as a single float really... So I'm not sure if vector instructions like horizontal add exist. They might, because vendors were making a big deal about supporting SAD, etc...

#5180890 How do work at home game developer jobs work

Posted by Hodgman on 16 September 2014 - 09:04 PM

What Promit said^
I was going say that this only occurs with companies too cheap to hire real employees, or employees that are experienced enough for people to trust them.

And yep, when using a VPN, it's exaclt the same as if your PC is connected to their internal LAN (but slower).

Smaller companies who don't have a physical office at all, might instead rent a server in the cloud. In this case you'd probably connect to their file-sharing / version-repository servers directly via the net.

#5180387 BSP portal visibility

Posted by Hodgman on 15 September 2014 - 12:38 AM

Keep in mind that Quake went with BSP because it allowed them to sort the world's polygons and then render them in order without any need for a Z-buffer. At the time, implementing a Z-buffer in their software renderer was an issue -- today Z-buffers are ubiquitous :)

#5180346 LOD in modern games

Posted by Hodgman on 14 September 2014 - 06:17 PM

I thought the gpu merges pixels from different triangles until a warp/wavefront can be scheduled?
I thought they would do this as well, but apparently there must be some obstacles with the implementation of that approach...


This guy's done the experiments and found quite a big performance cliff when rasterizing less than 8x8 pixels. It's also interesting that the shape of small triangles matters (wide vs tall)!

#5180340 My teams all hate eachother.

Posted by Hodgman on 14 September 2014 - 05:32 PM

Maybe the staff's perceptions of each other are all correct, and the problem is that they lack the capacity for self-reflection required to temper their own ego and the tact to politely test other's.
i.e. They're young :P

I was in a team like this. My advice would be:
give the programmers the freedom to design, as they're the ones implementing, so they're the ones who can most easily riff on design choices. Same with concept artists an visual design, and with 3D artists and props/items. Let the design be flexible enough to accommodate everyone - the game will likely be better for it, and the team will be more engaged.

Have everyone release their work into the project accompanied by a standard text file saying they grant the project unlimited license to use the work (removing the capacity for manipulative copyright shenanigans) - or better, use the MIT/BSD/WTFPL instead of your own custom made text file. Don't accept any Zips/etc where the text file is missing.

Stop offering payment if you don't have the cash up-front. Unless you've already formed a real company and have had your lawyer draft up a shareholder constitution, a schedule for issuing shares, and contributor agreements, then a promised profit-sharing scheme is NOT going to happen. If you are lucky enough to finish the game, you're going to have to do all of the above at release time, plus setting up bank accounts,shitting your pants over IRS forms, etc... And it's extremely likely that you will all get legally fucked over in the process.
It also makes anyone with any experience instantly see your project as a scam, dooming ou to inexperienced contributors.
It's much healthier to admit that this is a fan/hobby/portfolio project only, with no money involved. If you want to show your appreciation to your team-mates, send them an unexpected gift instead. If you want to dangle a carrot, say that if the ge is popular, you will form a studio to professionally create a sequel, with money that time.

#5180184 OmniDirectional Shadow Mapping

Posted by Hodgman on 13 September 2014 - 11:40 PM

6 matrices per frame is not much -- 384 bytes per light per frame frame. 100 lights at 60Hz might add up to 2MB/s, over a bus rated in GB/s :)

For comparison, an animated character might have 100 bone matrices supplied per frame.

But... Why do you need 6 matrices? The cube itself has one rotation and one translation value, which should be identical for every face, right?

Also, seeing as it's omnidirectional, you you even need to support rotation at all? A simple translation / light position value should be enough.

[edit] i.e. Subtracting the light position surface position gives you the direction from the light to the surface, which is the texture coordinate to use when sampling from the cube-map.

#5180144 Dynamic cubemap relighting (random though)

Posted by Hodgman on 13 September 2014 - 07:09 PM

Cheap idea for relighting the increasingly popular cubemap/image based lighting solution. Just store depth/normal/abedo of each cubemap face, and relight N cubemap faces a frame with the primary light/update the skybox. As long as you store the/apply the baked roughness lookups into the mipmaps, and apply such to the final output cubemaps you use for lighting, you get a dynamically re-lightable image based lighting system.

heres an example implementation, coined
Deferred Irradiance Volumes/

Turns out this idea works pretty well :D

#5180064 DX12 - Documentation / Tutorials?

Posted by Hodgman on 13 September 2014 - 07:38 AM

So in essence, a game rated to have minimum 512MB VRAM (does DX have memory requirements?) never uses more then that for any single frame/scene?
You would think that AAA-games that require tens of gigabytes of disk space would at some point use more memory in a scene then what is available on the GPU. Is this just artist trickery to keep any scene below rated gpu memory?

Spare a thought for the PS3 devs with 10's of GB's of disc space, and then just 256MB of GPU RAM laugh.png I'm always completely blown away when I see stuff like GTA5 running on that era of consoles!
Ideally on PC, if you have 512MB of VRAM, yes, you should never use more than that in a frame. Ideally you should never use more than that, ever!
If you've got 1024MB in resources, but on frame #1 you use the first 50% of it, and on frame #2 you use the second 50%, it's still going to really hurt performance -- in between those two frames, you're asking D3D to memcpy half a gig out of VRAM, and then another half a gig into VRAM. That's a lot of memcpy'ing!
Game consoles don't do this kind of stuff for you automatically (they're a lot more like D3D12 already!), so big AAA games made for consoles are going to be designed to deal with harsh memory limits themselves. e.g. on a console that has 256MB of VRAM, the game will crash as soon as you try to allocate the 257th MB of it. There's no friendly PC runtimes/drivers that are going to pretend that everything's ok and start doing fancy stuff behind the scenes for you biggrin.png
The tricky part in doing a PC version is that you've got a wide range of resource budgets. On the PS3 version, you can just say "OK, we crash after 256Mb of allocations, deal with it", and do the best you can while fitting into that budget. On PC, you need to do the same, but also make it able to utilize 512MB, or 700MB or 1GB, etc... The other hard part that on PC, it's almost impossible to know how much memory any resources actually take up, or how much VRAM is actually available to you... Most people probably just make guesses based on the knowledge they have from their console versions.

that method would have to also be used -for example- when all of that larger-than-video-memory resource is being accessed by the GPU in the same, single shader invocation? Or that shader invocation would (somehow) have to be broken up into the subsequently generated command lists? Does that mean that the DirectX pipeline is also virtualized on the CPU?

I don't know if it's possible to support that particular situation? Can you bind 10GB of resources to a single draw/dispatch command at the moment?


I don't think the application will be allowed to use that DMA-based synchronisation method (or trick? ) that you explained.

D3D12 damn well better expose the DMA command queues laugh.png nVidia are starting to expose them in GL, and a big feature of modern hardware is that they can consume many command queues at once, rather than a single one as with old hardware.


Wait. That's how tiled resources already work

Tiled resources tie in with the virtual address space stuff. Say you've got a texture that exists in an allocation from pointer 0x10000 to 0x90000 (a 512KB range) -- you can think of this allocation being made up of 8 individual 64KB pages.
Tiled resources are a fancy way of saying that the entire range of this allocation doesn't necessarily need to be 'mapped' / has to actually translate to a physical allocation.
It's possible that 0x10000 - 0x20000 is actually backed by physical memory, but 0x20000 - 0x90000 aren't actually valid pointers (much like a null pointer), and they don't correspond to any physical location.
This isn't actually new stuff -- at the OS level, allocating a range of the virtual address space (allocating yourself a new pointer value) is actually a separate operation to allocating some physical memory, and then creating a link between the two. The new part that makes this extremely useful is a new bit of shader hardware -- When a shader tries to sample a texel from this texture, it now gets an additional return value indicating whether the texture-fetch actually suceeded or not (i.e. whether the resource pointer was actually valid or not). With older hardware, fetching from an invalid resource pointer would just crash (like they do on the CPU), but now we get error flags.
This means you can create absolutely huge resources, but then on the granularity of 64KB pages, you can determine whether those pages are physically actually allocated or not. You can use this so that the application can appear simple, and just use huge textures, but then the engine/driver/whatever can intelligently allocate/deallocate parts of those textures as required.