I personally prefer when documenting a stable API, to make basic diagram (not formal like UML) explaining the flow of the data between the interfaces, a few key relationships and key processes. Just something that gets the general picture for my users. Much friendlier.
Thanks for the explanation. While I wasn't focusing on that part, the transient buffers make more sense to me now from a synchronization standpoint. When you talk about creating a default buffer, do you mean I should try to have as much as possible of my non-dynamic streaming data stored within a single buffer, and the pool refers to the staging buffers?
Yes. Immutable if possible.
I would think with level streaming, it would be risky to implement a single fixed capacity on the live buffer(s), so pool-type management for the live buffers would be useful too. For example, when streaming in a new package, the loader knows exactly how much capacity it will require, and can grab how ever many buffers it needs from the pool of unused buffers. Likewise as a level is streamed out, the buffers are no longer needed and are added back into the pool of unused buffers. Maybe I'm over complicating things. I definitely see the benefit of staging buffers for updating dynamic data, but I guess it's not as clear for the case of loading in a large amount of streaming data (not behind a loading screen).
The problem is that you're trying to build a car that runs over roads, can submerge into the ocean, fly across the sky, is also capable of travelling into outer space; and even intergalactic travel (and God only knows what you'll find!).
You will want to keep everything together into one single buffer (or a couple of them) to reduce switching buffers at runtime while rendering.
From a streaming perspective, it depends how you organize your data. i.e. some games divide a level into "sections" and force the gameplay to go through corridors, and while you run through these corridors, start streaming the data to the GPU (Gameplay like Tomb Raider, Castlevania Lord Of Shadows, fits this use case). In this scenario, each "section" could be granted it's own buffer. You already know the size required for each buffer. And if you page the buffer out, you know if it can be permanent (i.e. can the player go back?) or use some heuristic (i.e. after certain distance from that section, schedule the buffer for deletion, but don't do it if you don't need it, i.e. you still got lot of spare GPU RAM). You may even get away with immutable buffers in this case.
Second, you can keep an adjustable pool of immutable/default buffers based on size and regions. Remember you're not going into the unknown depths of the ocean or into the unknowns of a distant galaxy. You know the level that is going to be streamed. You know its size in megabytes, in kilometers, its number of vertices, how it's going to be used, how many materials it needs etc. You know how each section gets connected with each section (e.g. if F can only be reached from A, put it in its own buffer, and the player is likely to not return to F very often once it has been visited).
You have a lot of data at your disposal.
Open World games are trickier, but it's the same concept (divide the region into chunks that has some logic behind it, i.e. spatial subdivision, and start from there). Open World usually have a very low poly model of the whole scene to use until the higher quality data has been streamed.
My advice, algorithms are supposed to solve a problem. An engine solves problems. The answer on how to design your engine will be clearer if you approach the problem instead of trying to solve a problem you know nothing about. Try to make a simple game. Even a walking cube moving across cardboard city (open world) or pipe-land (corridor-based loading) should be enough.
Stop thinking on how to write the solution and start thinking on how to solve the problem. After that, how to write the solution will appear obvious.
That presentation is basically l33t speech for "how to fool the driver and hit no stalls until DX12 arrives".
What they do in "Transient buffers" is an effective hack that allows you to get immediate unsynchronized write access to a buffer and use D3D11 queries as a replacement for real fences.
Specifically, I'm working on implementing his "long-lived buffers" that are reused to hold streaming (static) geometry data. I've been unable to find much information on how best to implement it, however.
Create a default buffer. Whenever you need to update it, upload the data to a staging buffer (you should have a preallocated pool to avoid stalling if you create the staging buffer), then copy the subresource from staging to default. You're done.
You won't find much because there's no much more to it. Long-lived buffers assume you will rarely modify them, and as such shouldn't be a performance bottleneck nor a concern.
Usually you also have a lot of knowledge about the size you will need for the buffer. Even if you need to calculate it, the frequency of doing this is so little that often you should be capable of calculating it, or at least cache it.
The problem is when it comes to buffers that you need to update very often (i.e. every frame)
In real life neutral countries exist because either:
The conflict hasn't expanded yet enough to affect them.
They're strong enough to repel any invasion if they get involved (they could even seriously imbalance the war if they take side).
Most of the involved parties don't want anything of that country (i.e. why would Israel or Palestine want to take umm... Mexico?) or are emotionally attached to them (emotion != logic).
It's more beneficial to have them as an independent country than to have them take your orders. May be because their know how is too high and can't be used appropiately if you invade them, or their citizens could start small acts of terrors during the occupation, or guerrilla style fighting.
For a game, points 2 and 4 are the most interesting. Point 4 can actually be very fun and make the player go through a living hell.
Point 2 is easy. If you attack, you will be obliterated.
Point 4 is fun. You can attack, you may win. But pay the consequences until you release that land back. Random sabotages, slowdown of your resources gathering or slower building of units, critical unit-making buildings randomly exploding, inability to develop certain technologies. Allow the development of technologies or gathering the goods they offered when they were neutral, but at a higher price (or getting developed at a slower rate), etc.
Point 3 is possible if the game has a story. Get the player to actually love a civilization good enough so that most players will feel bad about invading it and prefer working alongside them. But this is really hard to execute well.
Since OpenGL 3.x; Khronos adopted a version numbering system of MAJOR.MINOR A change in major number means the hardware needs to be significantly upgraded (i.e. like going from a GeForce 280 to a GeForce 480, or from a Radeon HD 4850 to a Radeon HD 7850; which is going from DX10.1 hardware to DX11 or GL3 to GL4). A change in minor number means that 99% of the time a driver upgrade is all that you will need.
If your hardware supports OpenGL 4.0; then it's almost certain that by just updating the drivers it will be enough to get 4.3 (though there's always the risk that the vendor never releases a driver that supports 4.3 version, and goes straight to 5.x whenever it comes out), or even 4.5 for that matter.
As for the Intel HD 4000; Intel is usually behind when it comes to OpenGL drivers. Their current version is at 4.0; however they expose the most important 4.3 functionality through extensions (GL_ARB_multi_draw_indirect, GL_ARB_sync, GL_ARB_shading_language_420pack, GL_ARB_conservative_depth). They're missing compute shaders (GL_ARB_compute_shader) and Shader Storage Buffer Objects (GL_ARB_shader_storage_buffer_object); only the latter is where I have my doubts whether the HW can truly support it; however it's not a reason to not buy the book.
My recommendation is go buy the book. The differences will be slim (if SSBOs are even in the book) because most of what applies to 4.3 is provided by the Intel's 4.0 drivers (+ extensions)
Will the code examples from the book (OpenGL superbible) work on my machine?
Most of them, yes. You may have to edit the initialization routine so that it asks for a 4.0 context instead of a 4.3 one (which will obviously fail as soon as you launch the program and initialize OGL). For samples that use features that are not provided through extensions (like SSBOs and Compute shaders) it will obviously fail, but the rest of the samples will work.
What I think is confusing you is that we typically refer to the diffuse N * L formula (also known as N dot L, dot( N, L )) where N is the surface normal and L is the light's direction; when it is actually N * -L (notice the negative sign).
It's not that the direction becomes the position or something like that. Strictly speaking the formula is N * -L; but we often refer to it as just N * L (because we tend to look at it from the perspective "the ray that goes from surface towards the light"; in other words, the opposite direction of the light's real path it travels)
This is a very common source of confusion among people just starting with lighting equations.
Most tutorials dont go father than how to emit a basic billboard particle.
Because that's all there is to it. Just smoke and mirrors.
The key is in a good system that can emit lots of controlled billboards. And by "controlled" it means how many particles are emitted per second, of which type (i.e. size, material), rate of growth per second, colour randomization, where do they get emitted, if they follow a predetermined path or are attracted by some force (like gravity) etc.
The rest is just really good artists knowing how to take advantage of it.
Google "particle system emitter affector" for ideas on how to implement your own (i.e. a quick google returns theseinterestinglinks)
There are a few exceptions though i.e. for thunder/lightning effects you're better off writing a code that will create a chain/path of connected billboards (each billboard slightly reoriented) that randomly split into 2 or 3 paths at certain points. Then repeat until desired length is reached. (like this, but in 3D)
When we mean "advanced particle effects" we actually mean about voxels, fluids, and other very compute intensive stuff which isn't what you're asking for.
For what I can see it is failing because what DX9 did was to decompress, generate mips, compress again. Which is a lossy operation (theoretically it can be done as a lossless conversion by replacing binary data of mip 0 of the recompressed stream with the one from the original bc1, however the generated mips from bc1 sources should be of lower quality than generating mips from original sources).
Most likely D3D11 forces you to greater quality by first generating the mips from the source material, then compress. If the dds is already compressed and you want to pay the price, decompress it first.
I loved whole L. Spiro's post, but I have something to correct
Sorting 2 smaller queues is faster than 1 big one.
This is a half truth.
Sorting can take:
Best Case: O(N)
Avg. Case: O( N log( N ) )
Worst Case: O( N^2 )
1. In best case, N/2 + N/2 = N; so in theory it doesn't matter whether it's split or not. But there is the advantage that two containers can be sort in separate threads. So it's a win.
2. In the average case, 2 * (N/2 log(N/2)) > N log(N); having one large container should be faster than sorting two smaller ones (though there remains to be seen whether threading can negate the effect up to certain N)
3. In the worst case, 2 * (N/2)^2 < N^2; which means it's much better to sort two smaller containers than a large one.
In the end you'll have to profile as it is not a golden rule.
Spiro's suggestion of using temporal coherence assumes that most of the time you can get O(N) sorting using insertion sort; thus most likely having two smaller containers should be better (if you perform threading).
Update: Stupid algebra mistake. See lunkhound's post. Avg case is better when dividing and conquering.
The C++ standard does not mandate that the IEEE standard should be used for floating point.
Case in point, the PS2 did not follow it and anything divided by 0 returned 0, even 0 / 0 (which was a nice property for normalizing vectors and quaternions without having to check for null length).
Perhaps it's unfortunate that the C++ std says "undefined behavior", instead of "implementation defined". But that's how it is.
If it were implementation defined, I would rest assured MSVC, GCC & Clang would compile my code fine in an x86 machine, because it follows the IEEE. But unfortunately, it's UB, not ID.
In real world though, I would be very mad if the compiler optimizes my UB code away without any warning because the amount of UB scenarios the C++ standard can have are gigantic, and mistakes like this happen every time.
The ever lasting struggle of compiler writers who want to take advantage of the UB for optimization and are very picky/elitist about following the Std; vs the average programmer who wants to develop a program that isn't broken by technicalities like this.