I really don't understand why you say make a UBO for XY positions of the particles. Why not just another VBO?Is it because of a GPU memeory placement thing? Because I would need VAO for it, where I can get away with not having one if I use a UBO?
Because if you use a vbo, you would need 4 vertices per particles, but the ubo can just hold one entry (the position) per particle.
Your understanding of vbos and ubos is correct. However this is like a hack that works really well. And it works well because GPUs are getting more and more similar to CPUs, and vbos and ubos is just memory getting fetched, that the api put arbitrary restrictions that are no longer necessary.
By setting a null vbo and draw 4 vertices (1 particle), you're basically telling the api "draw 4 vertices of nothing" (aka iterate the vertex shader 4 times with no vertex data) but the vertex shader can fetch data from a different location (the ubo) with the help of gl_vertexid to determine the index in the ubo and the vertex position to generate.
glMapBuffer maps to the client's address space the entire data store of the buffer object currently bound to [/size]target
Interesting wordage there... glMapBuffer() "maps" to the client's address space? At first, I thought glMapBuffer() actually returned a pointer to the GPU's memory, but now it sounds like glMapBuffer()'s doing behind-the-scenes client-side copying, depending on the driver. Is my suspicion correct?
What the parragraph "maps to the client's address space" says is that, the memory can be accessed by the client in its own address space.
This could be either because through virtual memory the address actually translates directly to GPU memory (yay!), or as you say, the driver allocated some memory CPU side and will copy it later to the GPU (ouch).
The wording is carefully crafted (cryptic!) like that because OpenGL has a "server-client" architecture. Back in the 80's, a system would issue commands via OpenGL, and the commands would be carried over a network to a rendering workstation/farm.
Obviously, you can't map server memory directly from the client when the data travels through an ethernet cable; so the driver would allocate its own memory and send the commands later.
But when dealing with modern systems, the wording also allows virtual memory to directly map to GPU memory, which is what you want.
What actually happens heavily relies on the GL implementation; it's like Promit says.
Your best bets are with glMapBufferRange, instead of glMapBuffer. GL_MAP_UNSYNCHRONIZED_BIT increases the chances of getting a pointer directly mapped to GPU memory. But you then have to synchronize all access yourself (see ARB_sync and apitest).
Using GL_MAP_PERSISTENT_BIT increases the chances a lot more, since that's the whole point of persistent mapping.
In the 90's the first bottleneck was rasterizing a triangle. Once GPUs became better at it, the next most expensive operation was transform and lighting; which at that time was being done in the CPU and sent every frame to the GPU.
That's why HW TnL (Hardware Transform and Lighting) was invented, which kept the vertices always in the GPU, and the math was done entirely in the GPU. Later this would evolve in what we now know as vertex shaders.
I have a hunch that book could be really, really old.
If you read or re-read the slides I posted, that is the least recommended approach as they benchmark showed it's the slowest version. You should bench on your own to verify their results.
Their recommendation was to set:
No vertex buffer.
1 static buffer for your indices [6 * MaxNumOfParticlesPerDraw uint16; filled with zeroes]
1 dynamic UBO or TBO for your particle position [2 floats(x, y) * max number of particles].
Use gl_VertexId in the vertex shader to construct the quad.
Use glDrawElements. If you use glDrawArrays, you can avoid the index buffer (but usually comes with the overhead of having to switch between arrays and elements inside the GPU or the driver, since most geometry is indexed).
An easy old method to do ambient lighting is to do : DiffuseColor + (AmbientFactor * DiffuseColor * AmbientColor);
Another method is to have upper color and bottom color.
The best method is global illumination but still cost a lot.
Is it better to stay to the old way of ambient factor ?
True, to be more accurate on the question, I mean : what is the best option nowadays for ambient lighting to have a physical correct calcule and real-time ?
I don't get it. The first two options you mention are real time but they are VERY far from being what you call "physically correct"* unless we assume light never bounces to other surfaces; which almost never happens in real life.
First, there are more fake methods. Like...
Adding additional lights that are positioned relative to the camera (Hollywood movies do this a lot. Keywords here is 3 point lighting setup)
Adding additional lights that are placed in a controlled environment (usually a cinematic or some small indoor level).
There's also a technique which dinamically places point lights at the place where the light hits on the geometry; thus simulating the bounces (works only for directional, spot and area lights). The "Leo" demo from AMD showcases this. Draw the scene from the light's perspective, and use an UAV to store the fake point lights' positions.
Use IBL (Image Based Lighting) to loosely approximate GI. It can give very convincing results.
The best method is global illumination but still cost a lot.
That is veeeery broad.
GI can be:
Raytraced (or path tracing). Often not suitable for real time. Although the PowerVR guyssay otherwise. Intel also has a demo that didn't took off from around 2008 but I can't find it now. It was all running on the CPU.
Baked. Depends on your definition of "real time". It is baked, but you can move the camera in real time. Depending on how much data you bake, you may even be able to move the geometry, but not the lights.
Light Propagation Volumes. Crysis 3 uses them.
Voxel Cone Tracing.
Screen space. Gives terrible results to be used as a generic solution, but it is good enough for distant geometry (Crysis uses SSGI for distant geometry and LPV for close geometry).
* We actually reserve the term "physically correct" for something else. Being physically correct means the math obeys certain properties from the real world, like the fresnel effect and that the amount of light coming out of a surface can't be higher than the amount of incoming light (unless the material is generates lights from another source, like a fluorescent effect or a lit cigarette).
GI is just about the number of bounces that are taken into account by the math.
You'll hit a roadblock because there's no universal solution that will always work.
When the computer is directly connected to the internet (i.e. Dial Up modems, broadband USB modems, Ethernet in bridged mode); the winsock option is what you need to get the IP address; since the OS knows the external IP address.
When the computer is behind a NAT (i.e. Ethernet or wifi in most router configurations); the OS only knows the local IP address because THAT's the computer's IP address.
What you would like to know is the router's IP. You could ask the router's for its address (i.e. UPnP), but the router could refuse to tell it, or lie. Furthermore, if the router is behind another router.
You would have to ask the second router its IP address, that is, assuming you somehow learnt that there are 2 routers in the configuration (may be because router 1 told you, or analyzing packets streams, or through UPnP assuming it actually works).
And router 2 could be behind another NAT, router 3...
You could also try UDP punch through which exploits a loop hole in NAT implementations that would let you directly communicate with external clients. But it doesn't work with all routers (it's a hack!)
That's why an external website is often the best bet. The external website will only see the IP address of the last router in the chain and report that address to the client. Even that can fail because, if the user is behind a proxy, the external website will see the proxy's IP.
Over a few frames things settle down and the driver is no longer allocating new blocks of memory but is instead just handing back blocks of memory that had previously been used. So in other words it's not necessary to do your own multi-buffering, because the driver itself is automatically multi-buffering for you behind-the-scenes.
At this stage it's worth highlighting that this buffer update model is well-known and widely-used in D3D-land (where it's called "discard/no-overwrite") and has existed since D3D8, if not earlier; i.e it has close on 15 years of real-world usage behind it. So it's not some kind of voodoo magic that you may not be able to rely on; it's a well-known and widely-understood usage pattern that driver writers anticipate and optimize around.
Both DX12 and GL4 are moving away from this pattern and moving towards an explicit low level access memory management. With fences, unsynchronized access, and persistent mapping.
Drivers may optimize for the discard/map-no-overwrite pattern, but the higher level app. has much more information than the driver on how to deal with memory and access hazards. Driver optimizations can only go so far. But with great power, comes great responsability.
I'm sort of having trouble understanding how mixing these could be bad. Not saying that is what you mean, but it is implied that way to me. I would think this would act as an extra safety net to the whole unsyncronized methodology.
It's not that it's bad (btw mixing both doesn't act as safety net). The thing is that you needlessly generate a problem for you when you need to render with VAOs.
The VAO saves the VBO that is bound.
If you use three VBOs, either you modify the state of the VAO every frame, or have three VAOs (one per VBO). With just one VBO but using different ranges, you need one VAO, and no need to modify the VAO in any frame.
I basically see using glMapBufferRange + the unsyncronized flag as 'Put this data in the VBO right now, I do not care how or what you do just put it in there'.
Which could lead to things not drawing right if you accidentally map to a VBO that is being used in drawing.
That's the best thing that can happen. The worst thing that can happen is full system crash (BSOD, or not even that, DOS-style lockup needing a hard reset). Depends on GPU architecture and Motherboard (Bus).
You must synchronize.
Note that for dynamic content (and assuming you will be discarding the whole contents every frame), you just need one fence per frame. You don't need one fence per buffer.
If I use Round Robin with 3 VBOs or more and they all get mapped with glMapBufferRange + the unsyncronized flag, I would think that the only way it would fail is if my GPU is falling behind really really badly or something is seriously wrong.
If you don't use round robin, it's the same thing. Because the example I gave you, you would be writing to a region that the GPU is not writing right now.
Remember, it's not that the VBO is in use by the GPU while you're writing from the CPU. What's important is that the dword (4 bytes) you're writing to from the CPU is not currently being read by the GPU (to avoid a crash); and that the region of memory you're writing to has already been read from all the GPU commands until now (to avoid a race condition causing graphic corruption).
Is mixing those two methods just over kill?
Yes, because you gain nothing from mixing them, and complicate yourself by having more VBOs and more VAOs.
There are two types of interest rates, simple interest and compund interest. The example you gave is an example of compund interest. This is because after a year, the interests became part of the capital, and start generating interests on their own. Back in middle ages this was forbidden because the Church considered it usury, which is a sin (TBH, I don't think it's so far off...).
Simple interest's formula is of the form f(i) = C * (1 + i) * N
Where C is the original capital, i the interest rate, and N is time (could be in days, months, years; if you change N from i.e. years to months, you will need to adjust the i by dividing it by 12).
Compund interest's formula is of the form f(i) = C * (1 + i)^N
This answers your question. If you change N's unit of measurement (i.e. years to months), adjusting i is a bit more tricky. You actually need to do i' = (1 + i)^(1/12) - 1
Knowing that compound interest is the result of interest getting capitalized is very important. You may notice that while N is smaller than 1, simple interest is bigger than compound's. In real life (and this depends on legislation: whether interests are considered becoming part of the capital day by day, or only after the whole year has passed), often compound interest is just a piece-wise linear interest function.
During the first 12 months, the capital may grow at C * (1 + i) * N where N is in months, the second year it grows at C2 * (1 + i) * N; where C2 = C * (1+i) * 12 (In other words, C2 is the money you got after 1 year had passed)
This can be expressed by f(i) = C * (1 + i)^N; but only as long as N is a natural number (and not a Real number; unless the legislation considers the capitalization to happen day by day).
Well, that's enough for this post. Compound interest in real life situations can get very complex; and is in fact the subject of study of an entire quadrimester. You've got enough keywords now. Google is your friend.
When using the shared layout, you have to query OpenGL for the offsets of each element.
For example there could be hidden padding for alignment reasons. This padding depends on each GL driver. It could even change between driver versions.
std140 layout avoids the need to query the offsets, as there are rules on how the memory is layed out.
But it has very conservative memory layout rules so that it works on every possible hardware, and these rules are insane. Most importantly they're so hard to follow many driver implementations get it wrong (i.e. a vec3 always gets upgraded to a vec4, and four aligned consecutive floats should be contiguous, but I've seen drivers wrongly promote each float to a vec4).
I prefer std140 because it avoids querying and the offsets are known beforehand, while my GLSL declarations are all of type vec4 or mat4 and use #define macros if the name of a variable is too important for readability. Example:
layout(std140) uniform ubContext
vec4 times; //times.z and times.w are not used (padding)
#define currentTime Context.times.x
#define cosTime Context.times.y
This way you don't need to query, the offsets are known beforehand, and you're safe from most driver implementation problems.