In the specific case of flat shading, there's other solutions, such as setting the interpolation mode of the normal variable to flat/no-interpolate, or to not have any per-vertex normal data at all but calculate the surface normal in the pixel shader using the derivative of the position.
In an offline quality renderer, you'd sample every pixel in the environment map, treating them as a little directional light (using your full BRDF function), which is basically integrating the irradiance.
This is extremely slow, but correct... Even in an offline renderer, you'd optimise this by using importance sampling to skip most pixels in the environment map.
For a realtime renderer, you can 'prefilter' your environment maps, where you perform the above calculations ahead of time. Unfortunately the inputs to the above are at a minimum, the surface normal, the view direction, the surface roughness and the spec-mask/colour... That's 4 input variables (some of which are multidimensional), which makes for an unpractically huge lookup table.
So when prefitering, typically you make the approximation that the view direction is the same as the surface normal and the spec-color is white, leaving you just with surface normal and roughness.
In your new cube-map, the pixel location corresponds to the surface normal and the mip-level corresponds to the roughness. For every pixel in every mip of this new cube-map, sample all/lots of the pixels in the original cube-map * your BRDF (using the normal/roughness corresponding to that output pixels position).
Well I was thinking about sampling input at some multiple of the screen refresh rate and some other timing sensitive thoughts I was mulling over. But I want the phase of the samples to be in sync with refresh.
Yeah there's no reliable way to do that, that I know of -- aside from writing a busy loop that eats up 100% CPU usage until the next vblank
On older systems, the video-out hardware will send the CPU an interrupt on vblanking, and the CPU will then quickly flip the buffer pointers around.
However, modern systems will not involve the CPU at all -- the CPU will queue up multiple frame's worth of drawing commands to the GPU, and the GPU will be responsible for waiting on vblanks, flipping pointers, and syncing on back/front buffer availability...
By isolating these details inside the device drivers, the OS is able to evolve in this way... instead of being stuck with the old interrupt model, wasting CPU time, for the sake of compatibility with old software
You probably just want to create a background thread, and put it to sleep with small intervals similar to the refresh rate.
e.g. in doom 3, their input gathering thread is hard-coded to 16.66ms / 60Hz.
BTW, there's no such thing as d3d units, except in one place -- Normalized Device Coordinates (aka NDC) -- the final coordinate system used by the GPU rasterizer, which is:
z = 1.0 -- The far plane.
z = 0.0 -- The near plane
x/y = -1.0 -- The left/top edge of the screen
x/y = 1.0 -- The right/bottom edge of the screen
That's the only coordinate system that's native to D3D (or the GPU). Everything else is defined by your code. Your transformation and projection matrices do the job of converting from your own coordinate systems into the above NDC coordinate system for plotting onto the screen (and the z-buffer). (side note: if you're using fixed-function D3D9, then you're just using built-int shaders that perform these standard matrix multiplications, to convert from your own coordinates to NDC coordinates)
This also has implications for precision, because no matter what you do, your z values always end up in the 0 to 1 range, and x/y values in the -1 to +1 range... meaning there's a large number of possible 32-bit float values that are never used... meaning you're working with a lot less than 32-bit precision.
"Miniscule" is a relative term. However, if you consider $19/mo miniscule, I'll give it a try with your credit card.
$19/mo gives you a subscription for updates. A cancelled $19 subscription still lets you continue using the engine... So really, it's $19 per seat, per update you opt in to. Compared to what engines of this quality/capability used to cost, that is minuscule.
The real cost is in the 5% part. If you're making a low-budget console game, where you expect to make $10M in sales, that 5% works out to be half a million dollars... which is about the upper limit on what these kinds of engines used to cost. If you're making a big budget game, you'd just go directly to Epic and say "Hey, how about we scrap the 5% deal and just give you half a million dollars up-front".
Actually I'd also be interested to know what sorts of things people are needing from their engines that would make UE4 (Or even CryEngine) too restrictive, particularly from the hypothetical perspective of starting with a fresh codebase today.
Every console game I've worked on has involved custom graphics programming. There is no one true graphics pipeline which is optimal for every game. Different games want to put different attributes into their G-buffers, depending on the range of materials that the artists need. Different games will work better with forward vs deferred. Different games will have completely different requirements on shadows, post-processing, etc, etc... A good, flexible engine allows for the easy modification of it's rendering pipeline to suit the trade-offs required for a particular game.
When I see engines claiming things like "Supports parallax mapping!", I read that as "We've hard-coded most of the rendering features, so it's going to be complex for you to customize!". IMHO, CryEngine fits into this category, which is why it's not a good choice if you want to do any graphics programming at all. The $10/mo subscription version of Cry doesn't even let you write shaders for new materials - you just get what you're given! In the full source version of Cry (which still follows the 'traditional' engine licencing model -- i.e. is expensive), then sure, you could modify the rendering code... if you dare to wade into that mess...
I haven't played with UE4 myself yet, but I get the feeling that the rendering code is a lot cleaner / more maintainable.
Back to the original question - having worked as a game-team graphics programmer on top of half a dozen engines in the past, I've based my own rendering engine on the parts of each of them that made my job easier. i.e. my own engine is very flexible when it comes to implementing rendering pipelines.
If I was starting on my game now, I'd be very tempted to use UE4... but at this point, there's the sunken cost of already having developed my own tech, so there's not much incentive to switch now.
Your game can use 64-bit coordinates internally and still render with 32/16-bit floats. On the game side, your camera is at x = one billion meters, your ship is at x = one billion and one meters. That's a world-to-camera transform and a model-to-world transform. You subtract those two and you get a model-to-view transform of +1 meters, which is easily represented as a float. On the GPU-side, you just make sure to always work with either model or view coordinate systems, rather than absolute world positions.
so is 100,000 still to big? whats the max level size in popular game engines? 50K?
Depends on your units. For example, say 1unit = 1 meter, and you require your coordinates to have millimeter accuracy (e.g. anything smaller than 0.001 doesn't matter, but larger than that does matter). You can use a site like this to experiment with how floats work: http://www.h-schmidt.net/FloatConverter/IEEE754.html Let's try 5km and 50km.
5000.001 -> 0x459c4002
5000.0015 -> 0x459c4003
5000.002 -> 0x459c4004 -- notice here, incrementing the integer representation increases the float value by 0.5mm
50000.000 -> 0x47435000 --
50000.001 -> 0x47435000 -- cant resolve millimeter details any longer, our numbers are being rounded!
50000.004 -> 0x47435001
50000.008 -> 0x47435002 -- notice here, incrementing the integer representation now increases the float value by 4mm
So given my example requirements, 5km is ok for me, but 50km is not.
You might also want to consider 64-bit fixed point coordinates for a space sim (and conversion to relative float positions for rendering as above).
In 1D, a linearly interpolated point value combines the 2 closest samples -- the ones immediately to the left/right of the point.
In 2D, it's the same, but doubling the procedure over the up/down axis, combining the (2x2=)4 closest pixels (up-left, up-right, down-left, down-right -- NOT up, down, left, right).
The 3D case is the same, but done on the layer of voxels under the point and the layer of voxels above the point, resulting in (4x2=)8 voxels being combined (not 6).
I'd still say that learning UML is still a good use of time.
Everyone will use back-of-a-napkin diagrams for stuff, but their informal diagrams will often will incorporate UML-ish ideas. We all learn UML in beginner OO courses now days, which gives us all a shared experience that we can leverage for communication.
And yes, when writing code by myself, I'll often scribble simple relationship diagrams for the different components in the system so that the ideas can solidify in my mind before/as I write the code... But these will be informal diagrams, not strict UML.
That presentation is basically l33t speech for "how to fool the driver and hit no stalls until DX12 arrives".
What they do in "Transient buffers" is an effective hack that allows you to get immediate unsynchronized write access to a buffer and use D3D11 queries as a replacement for real fences.
That's a pretty dismissive way to sum it up
I don't see why transient buffers should be implemented as a heap like in that presentation's "CTransientBuffer" -- it's much simpler to implement it as a ring buffer (what they call a "Discard-Free Temp Buffer").
Write-no-overwrite based ring buffers have been standard practice since D3D9 for storing transient / per-frame geometry. You map with the no-overwrite flag, the driver gives you a pointer to the actual GPU memory (uncached, write-combined pages) and lets the CPU stream geometry directly into the buffer with the contract that you won't touch any data that the GPU is yet to consume.
Even on the console engines I've worked on (where real fences are available), we typically don't fence per resource, as that creates a lot of resource tracking work per frame (which is the PC-style overhead we're trying to avoid). Instead we just fence once per frame so we know which frame the GPU is currently consuming.
Your ring buffers then just have to keep track of the GPU-read cursor for each frame, and make sure not to write any data past the read-cursor for the frame that you know the GPU is up to. We do it the same way on PC (D3D9/11/GL/Mantle/etc).
Other map modes are the performance-dangerous ones. Map-discard is ok sometimes, but can be dangerous in terms of internal driver resource-tracking overheads (especially if using deferred contexts), but read/write/read-write map modes should only ever be used on staging resources which you're buffering yourself manually to avoid stalls.
Create "Forever" buffers as needed, at the right size. You pretty much have to do this anyway, because they're immutable so you can't reuse parts of them / use them as a heap.
The recommendations for "long lived" buffers basically just reduces the driver's workload in terms of memory management (implementing malloc/free for VRAM is much more complex than a traditional malloc/free, because you shouldn't append allocation headers to the front of allocations like you do in most allocation schemes). In my engine I currently ignore this advice and treat them the same as forever buffers. The exception is when you need to be able to modify a mesh -- e.g. a character who's face is built with a hundred morph targets but then doesn't change again often -- in that situation, you need DEFAULT / UpdateSubResource.