Matias GoldbergMember Since 02 Jul 2006
Offline Last Active Yesterday, 08:01 PM
- Group Crossbones+
- Active Posts 1,112
- Profile Views 11,792
- Submitted Links 0
- Member Title Member
- Age Age Unknown
- Birthday Birthday Unknown
Mar del Plata, Buenos Aires
Matias Goldberg hasn't added any contacts yet.
Posted by Matias Goldberg on 22 October 2014 - 09:33 AM
Just switch the hDC (1st argument) with the wglMakeCurrent/glxMakeCurrent call but use the same context (2nd argument).
It works **much** better than having two contexts.
The tricky part though, is VSync. You probably want to call swapbuffers once, not twice (i.e. one per window). This tends to be driver specific (i.e. do it wrong and the framerate will be 30fps, do it right, and framerate will be 60fps). You'll have to do some experimentation.
For this method to work, both windows must have the exact same pixel format and antialising options (can have different resolutions though), otherwise the wglMakeCurrent call will fail when you try the second hDC.
Posted by Matias Goldberg on 21 October 2014 - 05:58 PM
These tools, while great, have some level of inaccuracy. i.e. the CPU profilers use "sampling based profiling" which is basically statistical collection of where the program spends most of the time.
Just curious, why implement it when there's already great software that does that? RenderDoc for GPU, Visual studio for CPU, I can't see how rolling your own can be better in any scenario with such advanced tools available?
Statistics are averages and have standard error. Furthermore these tools don't work outside the dev. environment (i.e. the tool is not available to the user i.e. VTune costs money, the PDBs need to tag along, etc). Not to mention these tools may have trouble hooking up to your app if you do something weird (DRM, some problematic device driver, program is running behind a virtual machine, etc).
It also doesn't tell you how long it takes a specific component unless it's statistically relevant, which is important when you're trying to build a frame budget.
Another reason is that not all platforms can use these tools; and while it's great to have them on PC, it's not so great when you have to deal with other devices where these profilers either don't exist or have poor support.
Posted by Matias Goldberg on 21 October 2014 - 10:28 AM
1. Implement it yourself. Measure the time taken between each frame, using a high resolution timer like QueryPerformanceCounter or rstdc. You can measure the frame rate by comparing the timestamp between the last time the function was called and the current measure, then save the current value for comparing it in the next frame, or you can use the profiler pattern where a given function you surround it by calls like beginProfile() endProfile() so you can know how long it takes a specific function or module
2. Use third party profilers like CodeAnalyst, VTune, PerfStudio, nSight, Intel GPA, pix and visual studio graphics debbuger. Each of them have different compatibility when hooked up with your application, may have vendor specific features (I.e. CodeAnalyst can work on non amd machines but many profiling methods will be unavailable) or may diagnose the wrong place as a hotspot (rare)
Ultimately a good dev will use a combination of all of the above and not just one tool.
Posted by Matias Goldberg on 18 October 2014 - 11:54 AM
Looks like you've got a VGA LCD/LED monitor.
These monitors require calibrating itself to the signal. The monitor will remember a few calibrations for a given combination of resolutions and frequency hz. Probably your monitor saw that the frequency changed (i.e. 60 vs 75hz, or 59.9 vs 60hz) when you switched GPUs, and thus got the calibration wrong.
There should be an "auto" button in your monitor that runs the calibration procedure, which usually takes 1 to 5 seconds. You'll see the screen stretching itself until it aligns properly.
Run the "auto" procedure while you've got plenty of colour in the screen. If you run it while there's mostly black, the calibration will go wrong (i.e. 10% of the screen will be "outside")
If there's no "auto" button, consult the manual of you monitor. It could be an OSD option or you have to hold a particular key combination.
Posted by Matias Goldberg on 16 October 2014 - 01:59 PM
Nope. That has hardly changed.
It is possible that newer hardware has some smart workarounds for these issues.
Branches are commonly not analyzed in cycles but rather something called "divergency" or "input coherence".
GPUs work in parallel. Lockstep as it has been said. Threads are grouped, launched together, and must execute the same instructions.
When one thread inside that group needs to take a different branch than the rest of threads in its group, it is said that we have a "divergency".
The more divergencies you have, the bigger the negative impact of branches; as all the branches must be executed by all threads only later for the wrong results to be discarded (masked out).
When all threads in one group follows one branch; while another group follows a different branch; all is well. No divergency happens, and we say that the input is coherent, homogeneous, or that it follows a nice pattern.
Of course even if the data is coherent, some divergency may still happen for some groups. But the key here is whether the data is coherent enough so that the performance improvement of skipping work for most of the groups outweights the performance drop caused by the groups that ended up diverging.
Posted by Matias Goldberg on 13 October 2014 - 11:47 PM
I hope the post explains your doubts.
Posted by Matias Goldberg on 13 October 2014 - 03:50 PM
+1 to all that has been said against UML.
I personally prefer when documenting a stable API, to make basic diagram (not formal like UML) explaining the flow of the data between the interfaces, a few key relationships and key processes. Just something that gets the general picture for my users. Much friendlier.
Posted by Matias Goldberg on 13 October 2014 - 03:34 PM
Thanks for the explanation. While I wasn't focusing on that part, the transient buffers make more sense to me now from a synchronization standpoint. When you talk about creating a default buffer, do you mean I should try to have as much as possible of my non-dynamic streaming data stored within a single buffer, and the pool refers to the staging buffers?
Yes. Immutable if possible.
I would think with level streaming, it would be risky to implement a single fixed capacity on the live buffer(s), so pool-type management for the live buffers would be useful too. For example, when streaming in a new package, the loader knows exactly how much capacity it will require, and can grab how ever many buffers it needs from the pool of unused buffers. Likewise as a level is streamed out, the buffers are no longer needed and are added back into the pool of unused buffers. Maybe I'm over complicating things. I definitely see the benefit of staging buffers for updating dynamic data, but I guess it's not as clear for the case of loading in a large amount of streaming data (not behind a loading screen).
The problem is that you're trying to build a car that runs over roads, can submerge into the ocean, fly across the sky, is also capable of travelling into outer space; and even intergalactic travel (and God only knows what you'll find!).
You will want to keep everything together into one single buffer (or a couple of them) to reduce switching buffers at runtime while rendering.
From a streaming perspective, it depends how you organize your data. i.e. some games divide a level into "sections" and force the gameplay to go through corridors, and while you run through these corridors, start streaming the data to the GPU (Gameplay like Tomb Raider, Castlevania Lord Of Shadows, fits this use case). In this scenario, each "section" could be granted it's own buffer. You already know the size required for each buffer. And if you page the buffer out, you know if it can be permanent (i.e. can the player go back?) or use some heuristic (i.e. after certain distance from that section, schedule the buffer for deletion, but don't do it if you don't need it, i.e. you still got lot of spare GPU RAM). You may even get away with immutable buffers in this case.
Second, you can keep an adjustable pool of immutable/default buffers based on size and regions. Remember you're not going into the unknown depths of the ocean or into the unknowns of a distant galaxy. You know the level that is going to be streamed. You know its size in megabytes, in kilometers, its number of vertices, how it's going to be used, how many materials it needs etc. You know how each section gets connected with each section (e.g. if F can only be reached from A, put it in its own buffer, and the player is likely to not return to F very often once it has been visited).
You have a lot of data at your disposal.
Open World games are trickier, but it's the same concept (divide the region into chunks that has some logic behind it, i.e. spatial subdivision, and start from there). Open World usually have a very low poly model of the whole scene to use until the higher quality data has been streamed.
My advice, algorithms are supposed to solve a problem. An engine solves problems. The answer on how to design your engine will be clearer if you approach the problem instead of trying to solve a problem you know nothing about. Try to make a simple game. Even a walking cube moving across cardboard city (open world) or pipe-land (corridor-based loading) should be enough.
Stop thinking on how to write the solution and start thinking on how to solve the problem. After that, how to write the solution will appear obvious.
Posted by Matias Goldberg on 13 October 2014 - 12:35 PM
That presentation is basically l33t speech for "how to fool the driver and hit no stalls until DX12 arrives".
What they do in "Transient buffers" is an effective hack that allows you to get immediate unsynchronized write access to a buffer and use D3D11 queries as a replacement for real fences.
Specifically, I'm working on implementing his "long-lived buffers" that are reused to hold streaming (static) geometry data. I've been unable to find much information on how best to implement it, however.
Create a default buffer. Whenever you need to update it, upload the data to a staging buffer (you should have a preallocated pool to avoid stalling if you create the staging buffer), then copy the subresource from staging to default. You're done.
You won't find much because there's no much more to it. Long-lived buffers assume you will rarely modify them, and as such shouldn't be a performance bottleneck nor a concern.
Usually you also have a lot of knowledge about the size you will need for the buffer. Even if you need to calculate it, the frequency of doing this is so little that often you should be capable of calculating it, or at least cache it.
The problem is when it comes to buffers that you need to update very often (i.e. every frame)
Posted by Matias Goldberg on 11 October 2014 - 04:48 PM
Similar to wodinoneeye said.
In real life neutral countries exist because either:
- The conflict hasn't expanded yet enough to affect them.
- They're strong enough to repel any invasion if they get involved (they could even seriously imbalance the war if they take side).
- Most of the involved parties don't want anything of that country (i.e. why would Israel or Palestine want to take umm... Mexico?) or are emotionally attached to them (emotion != logic).
- It's more beneficial to have them as an independent country than to have them take your orders. May be because their know how is too high and can't be used appropiately if you invade them, or their citizens could start small acts of terrors during the occupation, or guerrilla style fighting.
For a game, points 2 and 4 are the most interesting. Point 4 can actually be very fun and make the player go through a living hell.
Point 2 is easy. If you attack, you will be obliterated.
Point 4 is fun. You can attack, you may win. But pay the consequences until you release that land back. Random sabotages, slowdown of your resources gathering or slower building of units, critical unit-making buildings randomly exploding, inability to develop certain technologies. Allow the development of technologies or gathering the goods they offered when they were neutral, but at a higher price (or getting developed at a slower rate), etc.
Point 3 is possible if the game has a story. Get the player to actually love a civilization good enough so that most players will feel bad about invading it and prefer working alongside them. But this is really hard to execute well.
Posted by Matias Goldberg on 08 October 2014 - 09:30 PM
Since OpenGL 3.x; Khronos adopted a version numbering system of MAJOR.MINOR
A change in major number means the hardware needs to be significantly upgraded (i.e. like going from a GeForce 280 to a GeForce 480, or from a Radeon HD 4850 to a Radeon HD 7850; which is going from DX10.1 hardware to DX11 or GL3 to GL4).
A change in minor number means that 99% of the time a driver upgrade is all that you will need.
If your hardware supports OpenGL 4.0; then it's almost certain that by just updating the drivers it will be enough to get 4.3 (though there's always the risk that the vendor never releases a driver that supports 4.3 version, and goes straight to 5.x whenever it comes out), or even 4.5 for that matter.
As for the Intel HD 4000; Intel is usually behind when it comes to OpenGL drivers. Their current version is at 4.0; however they expose the most important 4.3 functionality through extensions (GL_ARB_multi_draw_indirect, GL_ARB_sync, GL_ARB_shading_language_420pack, GL_ARB_conservative_depth).
They're missing compute shaders (GL_ARB_compute_shader) and Shader Storage Buffer Objects (GL_ARB_shader_storage_buffer_object); only the latter is where I have my doubts whether the HW can truly support it; however it's not a reason to not buy the book.
My recommendation is go buy the book. The differences will be slim (if SSBOs are even in the book) because most of what applies to 4.3 is provided by the Intel's 4.0 drivers (+ extensions)
Most of them, yes. You may have to edit the initialization routine so that it asks for a 4.0 context instead of a 4.3 one (which will obviously fail as soon as you launch the program and initialize OGL). For samples that use features that are not provided through extensions (like SSBOs and Compute shaders) it will obviously fail, but the rest of the samples will work.
Will the code examples from the book (OpenGL superbible) work on my machine?
Posted by Matias Goldberg on 03 October 2014 - 12:11 PM
To answer the OP's question, it's a direction.
What I think is confusing you is that we typically refer to the diffuse N * L formula (also known as N dot L, dot( N, L )) where N is the surface normal and L is the light's direction; when it is actually N * -L (notice the negative sign).
It's not that the direction becomes the position or something like that. Strictly speaking the formula is N * -L; but we often refer to it as just N * L (because we tend to look at it from the perspective "the ray that goes from surface towards the light"; in other words, the opposite direction of the light's real path it travels)
This is a very common source of confusion among people just starting with lighting equations.
Posted by Matias Goldberg on 03 October 2014 - 11:57 AM
Most tutorials dont go father than how to emit a basic billboard particle.
Because that's all there is to it. Just smoke and mirrors.
The key is in a good system that can emit lots of controlled billboards. And by "controlled" it means how many particles are emitted per second, of which type (i.e. size, material), rate of growth per second, colour randomization, where do they get emitted, if they follow a predetermined path or are attracted by some force (like gravity) etc.
The rest is just really good artists knowing how to take advantage of it.
There are a few exceptions though i.e. for thunder/lightning effects you're better off writing a code that will create a chain/path of connected billboards (each billboard slightly reoriented) that randomly split into 2 or 3 paths at certain points. Then repeat until desired length is reached. (like this, but in 3D)
When we mean "advanced particle effects" we actually mean about voxels, fluids, and other very compute intensive stuff which isn't what you're asking for.
Posted by Matias Goldberg on 30 September 2014 - 09:28 AM
Most likely D3D11 forces you to greater quality by first generating the mips from the source material, then compress. If the dds is already compressed and you want to pay the price, decompress it first.
Posted by Matias Goldberg on 29 September 2014 - 09:18 AM
Green => 255 * 0.125 ^ (1/2.2) = 99.09
That implies that the bit pattern for integer formats created from the floating clear color may not produce the same color.
Just FYI: clearing the backbuffer with clear color = (0.0f, 0.125f, 0.3f, 1.0f), doing a screen print, opening Paint, ctrl-V and sampling the color yields:
with clear color 0.0f, 0.125f, 0.3f, 1.0f
DXGI_FORMAT_B8G8R8A8_UNORM and DXGI_FORMAT_R8G8B8A8_UNORM
same clear color
Blue 76 (blue 77 with DXGI_FORMAT_R8G8B8A8_UNORM)
Blue => 255 * 0.3 ^ (1/2.2) = 147.52
Which is really close to the 99 and 149 values you got. I'm simplifying as gamma = 2.2; though sRGB is actually a piecewise linear function.
This is a linear vs sRGB problem.