Advertisement Jump to content
  • Advertisement

mind in a box

  • Content Count

  • Joined

  • Last visited

Community Reputation

887 Good

About mind in a box

  • Rank
    Advanced Member
  1. mind in a box

    Disable Ubuntu shortcuts

    Thanks for the input, guys!   We figured it would probably be the best solution to leave everything as it is right now. Most Linux-users will know how to change the bindings if they don't use them or just change the ingame binding of the action on the alt-key.
  2. On Ubuntu when capturing keyboard input with GLFW some keys like LEFT ALT are not passed to my window. Instead some desktop functionality is triggered like the dashboard search in case of the alt key.   How do I run a windowed game on Ubuntu, using all keys available on the keyboard, without telling the user how to turn off Ubuntu keyboard shortcuts all together at game start?  
  3. Hello everyone!   We just noticed that the same C++-Program performs way better when ran using gcc and Linux, than it does in Windows, using VisualStudio 2015.   The test program is calculates some integrals over and over again and was used in one of my programming assignments, where I work under Linux using my Laptop, which doesn't have a very fast CPU built in. The program finishes in about 0.9 second on it, while it takes a whooping 8-9 seconds on my much stronger Windows-machine!   Another problem is, that our game-project runs much faster on my friends linux machine than it does on mine. He is still going at ~140fps while some heavy physics scenes, while I'm down to ~20.   I'm very confused about that and I can't seem to find any more compile-flags which would improve the situation. The flags I used on Windows are (These come directly from a compiler-benchmark since I was desperate): /arch:SSE2 /Ox /Ob2 /Oi /Ot /Oy /fp:fast /GF /FD /MT /GS- /openmp While the gcc-build uses: -O3 -ffast-math -fopenmp -funroll-loops -march=native The same phenomenon could be observed on an other friends windows-machine which has an even better CPU then mine. It's really weird that my tiny laptop can outperform these computers in no-time. I mean, I should get the same code roughly up to the same speed on both operating systems, right?   Are there any more magic compilerflags to set or other pitfalls I should look out for?   Thanks in advance!
  4. Thanks for the reply! This flag is exactly what I needed. However, after doing some research, it doesn't seem possible to set using CMake without using some very ugly hacks. However, it turns out that you can compile GLEW just with a few .h./.c files yourself, and someone of my team came up with a CMakeLists.txt doing exactly that, solving the problem.
  5. Hi everyone!   I have been trying some openGL-Code lately and got some weird behavior when building one of my test projects.   Firstly, I use GLEW in a static library, which holds all the code for wrapping different graphics API. While this library and test-projects defined in the same CMakeLists.txt build and link correctly, external projects which use the library fail at the linker stage.   Linking to the renderlib from an external project, which only links to that renderlib and nothing else, makes the linker complain that it can't find glew32.lib. Which of course is true, everything that needs any code from GLEW should be statically linked in the renderlibs .lib-file.   Could it probably be that the compiler strips some static linkage of the GLEW-lib because of some optimization and then needs any program using the renderlib to link to glew for itself? Or am I probably just missing something here and this is completely normal behavior?   Any help is much appreciated!
  6.   Yeah, I should be CPU-Limited here.   However, I did some more tests. I can't replicate this behavior by using my own simple shaders. There is something Intel GPA seems to be doing when switching to "Simple PixelShaders" other than replacing all pixelshaders with the simple version. Probably they block setting the shader-resources or something, I have to dig more into that.
  7. Hi everyone!   I've been profiling my application lately, and I got something weird, which I hope one of you can explain to me.   In my Scene, I have a large World-Mesh and about 5000 decoration Objects, which all are using the same PixelShader, which simply does one Texture-Lookup. All in all, I get about 1000 Drawcalls from this and about 105fps on my laptop. There are no post process effects or similar techniques.   The weird thing is, when I switch to a very simple PixelShader (Using Intel GPA), my FPS-Counter ramps up to ~150, even when there are next to no pixels at all drawn on screen! (Small health-bar in the corner, but thats it). Applying a 1x1 scissor-rect doesn't give the same effect.   How can switching a pixelshader improve performance by so much, when there aren't even any rendered pixels on screen?   Thanks in advance!
  8. mind in a box

    About dynamic vertex pulling

      My meshes aren't using the same vertex-counts as of yet and I would like to get around this if possible.     Is this really true? Don't you have to specify the index-value of the first vertex? Wouldn't you just need to set that value to 0 for the first drawcall and to 1000 for the second? I'm pretty sure thats how it works, at least for the indices. You are probably talking about the offsets you can set while binding the buffers to the IA?     Won't SV_InstanceID be filled with the value I passed in the drawcall, regardless of a second vertexbuffer being bound? I guess I will have to test this, but it would save me the overhead of reading the same value as SV_InstanceID out of a buffer.
  9. mind in a box

    About dynamic vertex pulling

      I am done with the collection of all the buffers into one single bug buffer and it works pretty well, the packing at least. However, I am not quite sure how I would implement using different world-matrices without actually switching at least one constant buffer, since I can't use the instance-ID to figure out what I am currently rendering.   My idea would be to simple use DrawInstanced, passing only a single (maybe more) instance to render and setting the start-instance to the index the instance-data of my object is setting in a big structured-buffer bound to the vertexshader. That way I can access the instance-data using SV_InstanceID.   Would that be an appropriate solution or do you maybe have a better idea? The engine I'm currently working on unfortunately isn't far enough for me to test this now.     Never really thought about this. I actually don't need scale for most of the objects I'm working with, so that is going to be a really nice optimization!   Thanks for your answers!
  10. mind in a box

    About dynamic vertex pulling

    Thanks for the reply!     Not quite, the point of the technique is to minimize drawcalls further than instancing can go. It enables you to render lots of different geometry with different textures using only one single DrawInstanced-Call. Basically it IS DrawInstanced with these two parameters, but without any of the overhead comming from the drawcalls.     Right, I totally forgot about that!     Sure, but even copying 15000 world-matrices to a buffer would take a lot of time. I think a better approach would be to use this only for static geometry and work with indices to a pre-filled buffer with all of the instance-information.   Thanks for the matrix optimization-tip as well, I didn't know that!
  11. Hi everyone!   I look forward to implement the technique described in this Article, which looks fine by itself, but I got some questions about the implementation details.   First of all, the general idea seems to be that you have one big Vertex- and one big Indexbuffer to work with. You then put every mesh you want to be rendered in there and store the offsets and index-counts in an other datastructure which goes together with the instance-data into another buffer. Then all you need to do is to issue a call to something like DrawInstanced, with the maximum amount of indices a mesh in the buffer has, and walk the instance-data buffer to get the actual vertexdata from the buffers. If the mesh uses less indices as we told the Draw-Call, it says one should just use degenerate triangles and keep an eye on the vertexcounts.   Now, the article gives us a scenario about rendering a forest, with different types of trees and LOD-levels. #1: Why even bother with LODs, when we draw everything with the same vertex/index-count anyways? Idea: Use multiple instance-buffers with different ranges of vertex/index-counts and use more DrawCalls instead of wasting time on drawing overhead vertices on simple LOD-levels. Next problem is about the updating of the instance-buffer. Since of course we want some frustum-culling or moving objects if we are drawing a huge forest, we would need to do that every frame. The Article suggests that one should keep a CPU-copy of the data in the buffer and if something changes, just copy everything over again. #2: Wouldn't that take a huge impact on performance if we have to copy thousands of matrices to the GPU every frame? Also I'm pretty sure you would hit a GPU-sync point when doing this the naive way. Idea: I haven't looked to deep into them yet, but couldn't you update a single portion of the buffer by using a compute-shader or just do the full frustum-culling on the GPU? If not, there are those Map-Modes (other than WRITE_DISCARD) worth a shot where the data stays to update only single objects? Or do I just throw this into an other thread, use doublebuffering to help with sync-points and forget about it?   The last question is regarding textures. I assume that in the article the textures are all of the same size, which makes it easy to put them all into a TextureArray, as the Author is doing at least. #3: But I don't know much about the textures I have to work with, other than that they are all sized by a power of two. I'm using D3D11 at the moment, so TextureArrays is as far as I would get. Next problem is, that my textures can be dynamically streamed in and out. Idea: Make texture-arrays of different sizes and assume how many slots we would need for the given size. For example, pre-allocate a TextureArray with 100 Slots of the size 1024² and if we ever break that boundry or a texture gets cached out, allocate more/less and copy the old one over. Slow, but would work. Then use the shaders registers for the different arrays to get access to them. The other thing I could do is to allow this kind of rendering technique only for static level-geometry and to try to keep the textures for them in memory the whole time. Does anyone maybe have better solutions/ideas to the problems than me or can give me some other useful input about this technique?   Thanks in advance!
  12. mind in a box

    Modern Renderer Design

    Absolutely not. The render-queue is nothing but a very small set of integers (or a single 64-bit integer if possible) which contains data needed for sorting. That means a shader ID, texture ID, any small ID numbers that you want to include for sorting, and the fractional bits of a normalized (0-1) float for depth. Sorry for hijacking this, but I have a question related to this. First, in my Engine, which uses a similar design to imoogiBGs DrawCall-Structure, my rendering-queue is a std::vector which holds the pushed DrawCalls of the current frame. These then have their Sorting-Key stored inside them and implement the <-operator.   Now, I thought about maybe putting the key and a pointer to the structure it represents into a pair to use this for sorting for better cache locality. I guess since you explicitly say it should only store the keys as integer and nothing else, one should go with two lists for each queue where one contains the states and one the keys? After doing sort which gives doesn't move the data, but rather generates a set of new indices for the values you could use that on the list of DrawCalls?   Then, secondly: I have read a lot about the stateless approach and everywhere only saw people talking about storing the id-values of the shaders, buffers or textures. That makes sense for OpenGL, but how would I access the objects for something like D3D? As said, my current structure for D3D11 stores the pointers to the different ID3D11*-Objects a drawcall needs. Of course I could put everything in a hashmap, but I don't see why I should do that when I could store the pointers by having a class that understands the current API and pulls the data out of my generic buffer and shader structures.   Thanks in advance!
  13. mind in a box

    Slow BSP-Tree visibility checking

    I made sure this stays allocated. It's a std::vector<GVobObject*>. I think the slow part is actually to call the virtual methods of the objects so they can register their instances, since there are so many of them!
  14. Hi everyone!   I'm currently reworking my culling strategy and I came to a point where I just don't really know of what to do from here. My current project is to improve the renderer of an older game (Open-World RPG) and it's working very nice so far, just not as fast as it could. (Using D3D11, C++, if that matters)   This game basically has a static world-mesh and objects stored in a BSP-Tree, so I already have that. The BSP-Tree itself is created around the world-mesh and breaks the world down into little about 5 meter-AABBs of ingame space, which then contain a list of the objects resisting in that leaf.   Also bigger objects may are registered in multiple leafs as all of them are roughly the same size.   That leads me to following approach: Walk the BSP-Tree and check visibility to cut of branches If gotten to a leaf, iterate over the objects in all lists (Outdoor, Indoor, Small, etc) Check a flag if an object has been drawn before (Big-Object issue) and register it in the renderer After that, copy all instance-IDs to the GPU, where I remap that ID to a static World-Matrix-List in a structured buffer. If I initialize everything only once, putting every object into the renderlist, then I can draw my whole world at about 200fps, which is nice. If I enable the culling I am down to ~100fps while not even drawing 1/4 of them.   Using profiling, I have found out that most of the time is spent in the function that iterates through the objectlists of a BSP-Leaf.   It basically only does this, like 5 times for the different lists: if(nodeDistance < vobOutdoorDist) { for(auto it = node->Vobs.begin(); it != node->Vobs.end(); it++) { GVobObject* vob = (*it); if(!vob->IsAlreadyCollectedFromTree()) { // Just draw vob->DrawVob(); vob->SetCollectedFromTreeSearch(); BspDrawnVobs.push_back(vob); } } } Internally the "DrawVob"-Method only pushes an indexvalue to an std::vector after the first time it has been called. It is still slow when I remove the code inside it completely.   It would be great if someone could tell me how this kind of scene is usually handled efficiently. By the way, it is still slower, even if I completely remove every code in the "DrawVob"-Method, so it's nothing in there causing the slowdown.
  15. Thanks for the thread, didn't even knew I had this problem as well in my engine! Text is so much better now :)
  • Advertisement

Important Information

By using, you agree to our community Guidelines, Terms of Use, and Privacy Policy. is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!