Jump to content
  • Advertisement

hgoel0974

Member
  • Content Count

    46
  • Joined

  • Last visited

Everything posted by hgoel0974

  1. I've been implementing CPU based voxel ray casting on a BSP tree, in the end a node consists of about 30 voxels, however, the performance is terrible, I ran the program through the Visual Studio Community Edition CPU profiler and got the following output: http://i.imgur.com/SbZIJCn.png Since it just says that '[External Code]' is taking most of the CPU time, I'm not sure how to proceed. I can't identify many serious performance issues, and I find it hard to believe that the recursion could be the problem. My Ray-AABB intersection tests don't appear to be too slow either. My code: BSP Tree code (yes, I know the filename says Octree): VoxelOctree.cs GI_IntersectionTests.cs GIWorld.cs Any suggestions on how I should approach figuring out this issue would be appreciated. The ray casting and tree traversal code is in the RayCast function in VoxelOctree.cs
  2. Sorry for the late reply, today has been a busy day at university. I made the changes you guys recommended, namely, switching out the elementat call for an array access and moving the BoundingBox 'new' call outside of the loop. These two changes have indeed helped increase performance but it still isn't up to the point I was thinking voxel raycasting would be at. I have the following output on the profiler now, the GIWorld.Update call doesn't even show up on the list, I'm looking into how to get the External code portions to show more detail though. http://imgur.com/dLwm01m   Additionally, it seems there's a bug in the actual raycasting code, as a result of which the results are incorrect. I was hoping to use voxel raycasting on the CPU to do an additional low resolution indirect lighting bounce and then upload that to the GPU for compositing, which is beginning to seem like a not too great idea.
  3. Sorry about that, just noticed and fixed the link. EDIT: Ah, a minute too late. EDIT: The boundingbox class doesn't actually do any calculations on the data, but I'll give your suggestion a try in the morning.
  4. I was playing around with some shader code when I ended up with something that seemed to be working rather well as a reflection shader. However it has artifacts near the camera due to the transformation by the projection matrix, however, if the transformation is not there, while the artifact is gone, the reflections are no longer perspective correct. I'm not sure how to deal with this.   Code: #version 430 core in vec2 UV; layout(location = 0) out vec4 reflection; uniform sampler2D colorMap; uniform sampler2D normData; uniform sampler2D specularData; uniform sampler2D worldData; uniform sampler2D envMap; uniform sampler2D depthMap; uniform vec3 EyePos; uniform mat4 View; uniform mat4 Projection; void main(){ vec3 n = 2.0 * texture2D(normData, UV).xyz - 1.0; n = normalize(n); vec3 worldCoord = texture2D(worldData, UV).rgb; vec3 v = EyePos - worldCoord; v = normalize(-v); float depth = texture2D(depthMap, UV).r; vec4 vis = vec4(0); vec3 refNorm = (Projection * View * vec4(reflect(v, n), 0)).rgb; refNorm = 0.5 * refNorm + 0.5; vis = texture2D(colorMap, refNorm.xy); reflection = vis; } Without Projection matrix: http://i.imgur.com/FUxqgdy.png   With Projection Matrix: http://i.imgur.com/Q2jsnxC.png   The artifact kind of gets hidden as I start to move faster, but it'd still be nice to get rid of it. Just had the thought while writing the question, is the artifact possibly due to missing information?
  5. I'm sorry, I should have been more clear, I was referring to the circle like area where the reflection is suddenly different around the ship. The changing filter type idea did give me the idea to see if that was due to missing information by disabling texture wrapping, which did eliminate the issue, sort of, it replaced the circle with stretching, which works better.
  6. hgoel0974

    Vulkan is Next-Gen OpenGL

    Well, since I couldn't wait to play around with Vulkan but the AMD drivers seem to be broken at the moment, I generated some quick C# bindings for it, they're partly unusable because of the way the tool translates types but now it's just a matter of a bunch of cleanup work.   https://github.com/himanshugoel2797/Vulkan.NET   Also, AMD has a new version of their Vulkan drivers, which claim to fix the issues with the latest SDK version.   EDIT: Said drivers still aren't working for me, applications launch but freeze on launch.
  7. I'm working on a voxel engine and have some of the basics done with everything drawing decently fast. However there are two issues I'm having trouble coming up with a solution for:   First is stutters during chunk generation, I do chunk generation on a separate thread and once the vertices are ready, I tell the main thread to pull them into VBOs. This sounds like it would work fine however the pulling stuff into VBOs part (glBufferData) causes a slight stutter even though I'm uploading on average about 2000 vertices per frame (each vertex is also packed into a single int using the Int2101010Rev format, this yielded a massive performance increase). I tried using persistent mapped buffers but they cause the framerate to drop to 60 which is unacceptable considering that I have nothing but dot product lighting going on. I tried it (separately) with the coherent bit and with the unsynchronized bit but neither had much of an effect. So, my first question is, How can I reduce those stutters to the point that they are no longer noticeable?   Second is increasing the size of vertex batches, while profiling in CodeXL, I see that my batches are far too small compared to the recommended size of ~40k vertices. My batches are more around the 1k vertices mark. (about 20% of vertices with about 50% of batches). Increasing that should help with performance and it would also reduce draw calls but I'm unsure of how to do that considering the way I generate my data: 1. Fill an array with the voxel ids 2. Loop over the array and build a mesh using 'greedy' meshing, generating faces relative to the chunk 3. Calculate the normal of the face and place it in a list appropriately 4. Finally, combine the vertex information for all the faces into one list, keeping track of which normal is at which offset 5. Put this data into a VBO 6. In the vertex shader, use the normal grouping information passed in as uniforms along with gl_VertexID to determine which normal to pass along, a world transform positions the chunk properly.   As you can see, because of this, I can't just reduce my draw calls since at least two things depend on the current call. The normal however can be worked around by just batching in groups of normals, however the World transform remains an issue. I think I might be able to do something using MultiDrawIndirect but my problem with that is that I have never understood how I might determine which draw call I'm at if on hardware that doesn't support gl_DrawID. I have read about the trick with using a base instance however I don't see how that's correct since a vertex attribute isn't considered dynamically uniform and thus won't be usable as an index into a UBO. So my second question is, how do I increase the size of my batches and reduce my draw calls?   The relevant code is at (mainly Chunk.cs and BlockManager.cs) https://github.com/himanshugoel2797/Messier-Game/tree/BruteForce/Messier/Engine The master branch has the persistent mapping based approach implemented.
  8. hgoel0974

    When you realize how dumb a bug is...

    Not exactly a bug but today I was working on a greedy isosurface extractor and I ended up with this issue with not being able to determine the normals. Spent an entire day trying first taking the cross product of two edges but that didn't give me the final direction properly, then went for weird tricks with the slopes and attempting to detect triangle winding and trying to get the direction from the center of the voxel until I realized just how much I was over thinking it, the one unused axis for each face is the normal direction and all I needed to do was check if adjacentvoxels were empty... As much as I liked having solved the issue, it's sad it took so long to realize something so simple.   Edit: And another bit of stupidity: for (; (a[d] == Side - 1) ? a[d] >= 0 : a[d] < Side - 1; a[d] += (a[d] == Side - 1) ? -1 : 1) { if (VoxelMap[MaterialMap[this[a]]].Visible) { done = true; } if (done) break; } I wonder why that loop is taking so long....
  9. hgoel0974

    When you realize how dumb a bug is...

    So I just spent almost a full 12 hours trying to figure out why the logarithmic depth buffer wouldn't work with the tessellation shader, it'd work fine without the tessellation stage and with it, it'd seem like the far clip plane was being set to 100. So naturally, I kept looking for problems with the depth buffer setup, but found nothing.   Then, just a few minutes ago, I looked at the function to calculate the tessellation level, this was what it was:   float GetTessLevel(float Distance0, float Distance1) { float AvgDistance = (Distance0 + Distance1) / 2.0; return 10 - clamp(AvgDistance/10, 1, 10); } See it?  These kinds of bugs IMO are the most painful, they're simple but not at all obvious.     The bug is in the return statement, tessellation of 0 doesn't draw anything, thus making things look just like far clipping. The correct version would be: float GetTessLevel(float Distance0, float Distance1) { float AvgDistance = (Distance0 + Distance1) / 2.0; return 10 - clamp(AvgDistance/10, 0, 9); }
  10. I'm working on a 3d game engine. So far it uses MultiDrawElementsIndirect for the rendering with no fallback, assuming minimum OpenGL 4 required. I am now considering using bindless textures to be able to have individual textures per object. But I'm not sure if that's a good idea since it isn't core, how much support can I expect for it considering that it's ~3 years old? Also, for anyone who has tried it, is it worth the work or am I better off using texture arrays? (I want to avoid textures because of how each layer has to be the same size, but all my textures won't be like that, which makes it seem very wasteful)   I would go ahead and implement both but I'd prefer not to have to do more work than necessary yet.
  11. Sorry for the late reply. I thought about those things after reading your post and reached the conclusion that you were right, I shouldn't mess around with GL4 features since I can't expect everyone to have it. The reason I'm using a custom engine is because I prefer to do everything myself. I ended up switching back to the normal GL 3 backend for my engine and am just focusing on making a game and optimizing the engine accordingly.
  12. @Hodgman, Thanks for the links, they'll be a nice resource so bookmarked.   @vlj I did read it needed hardware support although I hadn't thought about the fact that it would introduce an overhead. @cozzie, Yeah, I've been trying ot make sure I don't end up just going for the latest extensions. Bindless textures seemed to be the easiest way to do what I needed, the next most viable option being texture arrays + a lot of management code, which is what I ended up going with. I'm still working on it though.   Basically, I specify a maximum resolution a texture can have (the dimensions of a single texture on a texture array) and then assign it an ID based off of the layer, this, along with offsets and size information is all written into a uniform buffer and accessed in the shader, which has the same effect as bindless textures.   Currently I was upgrading my shading language library to handle uniform buffers and while the GLSL output looks valid, but I guess passing a flat shader variable from the vertex shader doesn't count as a dynamically uniform expression? How else can I get the per draw index over to the fragment shader then in such a way that it's still considered dynamically uniform?
  13. I should have been more clear, by offset I meant baseVertex, but yeah, I found the problems (one of which was assuming that baseVertex was *3, the others being tons of bugs in the code that manages memory in the persistent mapped buffer) and got it all working again.
  14. I've been trying to implement the AZDO model in my game engine. It has been going nicely but I'm having trouble getting any sort of drawing working, I'm not sure what the problem is, I've checked all the buffers to see if they have the necessary data and even tried drawing using DrawElements instead of MultiDrawElementsIndirect.   All the buffers seem to have the right data, all the buffers are also bound but the window still remains black, I think I might be doing something wrong with the way the buffers are being bound before the draw, but no matter what I try I can't figure out exactly where the problem lies.   My engine (well specifically the 'Kokoro' folder): https://github.com/himanshugoel2797/Akane-and-Kokoro-Game-Engines   The 'game' it's running: https://github.com/himanshugoel2797/Akane-and-Kokoro-Game-Engines/blob/master/Sandbox/TestA.cs   How it works is, each thread has a local command buffer (Kokoro/Sinus/SinusManager), and once per frame, the command buffer is pushed to a main thread where all the commands are dispatched to the GPU.   So, the Model.cs file maintains two static vertex arrays which further consist of 4 persistent mapped Buffer Objects each(Indices, Vertices, Normals, UVs) (Vertex Arrays are in Kokoro/OpenGL/PC/VertexArrays.cs, Buffer Objects in Kokoro/OpenGL/PC/GPUBufferLL.cs), when a mesh is loaded/generated, it's allocated into one of the vertex arrays depending on how often it's updated.   In the render loop, on calling Model.Draw a draw command is buffered into a DrawIndirect buffer object (another instance of GPUBufferLL, the relevant code is in Kokoro/OpenGL/PC/GraphicsContextLL.cs) Additionally, a shader is bound (for now only one, I'd like to get drawing working before reorganizing the shader pipeline for AZDO) for the one draw call made. The shader is a standard fullscreenquad shader (Kokoro/ShaderLib/FrameBuffer.cs). It's written in a custom shading 'language', which is basically C# with a bunch of tricks to generate the shader code at runtime. It should be fairly easy to understand, however I will post the GLSL output at the end of the post.   At the end of the draw loop, on calling SwapBuffers, the draw commands (the struct data) are uploaded to the GPU and a MultiDrawElementsIndirect call is issued, then the backbuffer is swapped.   The MultiDrawElementsIndirect code from Kokoro/OpenGL/PC/GraphicsContextLL.cs: #region Multidraw static object locker = new object(); static GPUBufferLL MDIBuffer = new GPUBufferLL(Engine.UpdateMode.Dynamic, Engine.BufferUse.Indirect, 1024); //Allocate 1kb to the indirect buffer static List<MDIEntry> MDIEntries = new List<MDIEntry>(); static int EntryCount = 0; static int EntryOffset = 0; internal static void Draw() { Engine.Model.staticBuffer.Bind(); //Eventually this should be replaced by one call for static buffers and one for dynamic buffers SinusManager.QueueCommand(() => { //GL.DrawElements(BeginMode.Triangles, 6, DrawElementsType.UnsignedInt, 0); GL.MultiDrawElementsIndirect(All.Triangles, All.UnsignedInt, IntPtr.Zero, EntryCount, 0); }); } //Submit all the current draw calls and clear the draw list internal static void SubmitDraw() { lock (locker) { //Build the array to submit to the GPU uint[] buf = new uint[MDIEntries.Count * 5]; for (int i = 0; i < MDIEntries.Count; i++) { buf[i * 5] = MDIEntries[i].count; buf[i * 5 + 1] = MDIEntries[i].instanceCount; buf[i * 5 + 2] = MDIEntries[i].first; buf[i * 5 + 3] = MDIEntries[i].baseVertex; buf[i * 5 + 4] = MDIEntries[i].baseInstance; } EntryCount = MDIEntries.Count; Console.Write(EntryCount); //TODO Remove this console output later MDIEntries.Clear(); if (buf.Length > 0) MDIBuffer.BufferData(buf, 0, buf.Length); } } internal static void AddDrawCall(uint first, uint count, uint baseVertex) { lock (locker) { //Append the draw call data to the MDIBuffer MDIEntries.Add(new MDIEntry() { baseInstance = 0, baseVertex = baseVertex, count = count, first = first, instanceCount = 1 }); } } #endregion Vertex Shader: #version 330 core layout(location = 0) in vec3 VertexPos; layout(location = 2) in vec2 UV0; uniform sampler2D ColorMap; uniform sampler2D LightingMap; uniform sampler2D NormalMap; uniform mat4 World; uniform mat4 View; uniform mat4 Projection; uniform float ZNear; uniform float ZFar; out vec2 UV; void main(){ gl_Position.xyz = VertexPos; gl_Position.w = 1; UV = UV0; } Fragment Shader: #version 330 core layout(location = 0) out vec4 Color; uniform sampler2D ColorMap; uniform sampler2D LightingMap; uniform sampler2D NormalMap; uniform mat4 World; uniform mat4 View; uniform mat4 Projection; uniform float ZNear; uniform float ZFar; in vec2 UV; void main(){ Color = texture2D(ColorMap, UV); } It's probably a bit much to ask for help with something that's spread across so many different files, but I'm out of ideas as to what I could be doing incorrectly.
  15. hgoel0974

    When you realize how dumb a bug is...

      lol, I had something very similar happen to me, only it was between glMultiDrawArraysIndirect and glMultiDrawElementsIndirect and there wasn't a segfault because C#   By general rule, the moment you decide to copy and paste is probably the moment to take the code and isolate it into its own function because that is most likely the point when you can tell that a piece of code is worth to be made reusable. If you haven't needed to repeat the code yet then it's probably better to not waste the time doing it though.   Of course, easier said than done. If you're the only person touching the code then it's not really a problem since you'll most likely consistently remember that rule, the problem is trusting other programmers to be doing the same.   I don't forget how my code works, I just forget which code I have, when I look at the code, I instantly remember how it works and all the quirks about it, the problem is remembering that I had the code. I usually look over my engine's file tree quickly before I start working. As for copy pasting bits of code, usually I would separate it into a separate function, but it didn't feel reasonable here since it was just a single loop that filled an array with data extracted from Assimp's representation.   I create a const variable (since C# doesn't have #define style values) for when I think a number might be too specific to remember why it was there later, or else I place a comment. But something like i+=3 in my opinion didn't need it because it'd always be obvious why the 3 was there.
  16. hgoel0974

    When you realize how dumb a bug is...

    Once when I was working on a 3d model loader for my game engine, I kept getting messed up normals on anything but a sphere or plane. I looked through everything, began suspecting that they were getting messed up when exporting from Blender, then I looked it up and turned out there had been a bug like that once. Tried another format (since the loader was just Assimp.NET), but nothing changed. Then after 3-4 days of trying different things and fixing possible issues, I noticed:   for(int i = 0; i < Normals.Count * 2; i+=2){ .... }   :|  It felt so stupid that I suspected sabotage from my roommates lol. Now that I think about it, I probably forgot to change the values if I copied the loop for the UVs. (and yes, later I did realize how horrible the idea of looping over everything like that was, and so I've written a proper custom model loader)   Also, a few days ago I was having trouble getting MultiDrawIndirect working. I spent an entire day on it and finally gave up and went to sleep. Then, as soon as I woke up, it hit me, I was using Model q = new Quad(0,0,1,1); to test the drawing, but with the way everything was setup, it wouldn't necessarily be the first thing I saw when I launched the game. I remembered that I had a fullscreenquad class (which was what I intended all this time), so I changed it to Model q = new FullScreenQuad(); and suddenly, it works! It made me get worked up about my skill as a programmer though, not even remembering parts of my own game engine.    Another time, I was having issues with the buffer object memory manager, I came up with all sorts of complicated ways I could get around the issue, then I noticed that I could just remove one check to get the same effect, that was yet another wasted day.
  17. hgoel0974

    Vertex Shader Iteration

    Why not move the updating code to a fragment shader which renders to a texture, then use vertex texture fetch for the positions. Then next frame you bind the previous frame's particle framebuffer as a texture and render the data to a second framebuffer, then use vertex texture fetch and draw, then next frame bind second framebuffer as the texture and use the first as the render target and repeat. Basically, ping-pong the particle data between the two framebuffers. This post describes what I mean in detail, as you can see, the performance isn't too bad either http://www.hackbarth-gfx.com/2013/03/17/making-of-1-million-particles/
  18. hgoel0974

    Github DDos Attack two days ago?

    Heard about the attack a few days ago when I was trying to push my code to github but it wouldn't go through, made me suspicious because GitHub hasn't really ever had trouble connecting before. I was extremely annoyed to read that they were being DDoS'd. Luckily though, I tried a few minutes later and the commit got through. Still pretty annoying, and I don't even want to begin to imagine the chaos at GitHub HQ as they tried to figure out how to deal with it all. Constantly makes me think about how the internet could really use a redesign based off of modern security techniques but at the same time reminds me that something like that won't happen any time soon.
  19. hgoel0974

    What Are You Working On

    Been working on my 3d and 2d game engines, C# based shading language and relevant tools for the past 4 months. I had rendering working nicely but then I decided to switch to a multithreaded gameloop and the AZDO model so I'm still getting the ubershader generator and texture arrays set up. Also been working on designs for my survival horror game with some quick renderings (in blender) http://i.imgur.com/QL0Xw4q.jpg I'd love to get results like that in my game engine but that's still far off. I've also been thinking about sphere-trees for compute shader physics and from the same data, distance-field global illumination, but I'm not sure if that will be feasible. Other than that I'm doing on and off research on game AI and procedural art and music.   Actually I've been working on the game engines for almost 2 years but I didn't get satisfied with my engine design until a few months ago (as a result I made a lot of rewrites). Oh and when I'm not doing any of the above, I'm watching anime!
  20. I'm now trying to draw multiple meshes and it feels like the offset for each mesh is always assumed to be zero (so only the first created mesh is drawn). However as far as I can tell, all the commands are being submitted correctly.   The structure I am using is:   struct MDIEntry     {         public uint count;         public uint instanceCount;         public uint first;         public uint baseVertex;         public uint baseInstance;     }   if I have two meshes, one which has 250 indices and 400 vertices, and the other has 100 indices, 200 vertices   the struct should be something like:   meshA = new MDIEntry{ count = 250, instanceCount = 1, first = 0, baseVertex = 0, baseInstance = 0 }   and   meshB = new MDIEntry{ count = 100, instanceCount = 1, first = 250, baseVertex = 400 * 3, baseInstance = 0 }   Is this correct?    I'm beginning to think it might just be a driver issue because no matter what I set the baseVertex value to, nothing changes.
  21. I got it to work now, turns out I just needed to sleep over it    But well I'm not sure exactly where the problem was, I realized that I wasn't synchronizing properly when using persistent mapping (immediately issuing the sync after writing to the buffer rather than issuing after I was done using the data in the buffer. I also found out that I was creating the wrong primitive (was doing new Quad(0,0,1,1) instead of new FullScreenQuad() which is embarrassing considering that I wrote the engine). Additionally there were mistakes where a Command buffer command would actually write to the command buffer rather than executing the command, causing the commands to get out of order and occasionally skipped.
  22. When working with Persistent mapped buffers in C# and OpenTK, you end up needing to use unsafe operations and need to do something like: WaitSyncStatus waitReturn = WaitSyncStatus.TimeoutExpired; while (waitReturn != WaitSyncStatus.AlreadySignaled && waitReturn != WaitSyncStatus.ConditionSatisfied) { waitReturn = GL.ClientWaitSync(syncObj, ClientWaitSyncFlags.SyncFlushCommandsBit, 1); } //Write the data unsafe { fixed (uint* SystemMemory = &data[0]) { uint* VideoMemory = (uint*)mappedPtr.ToPointer(); for (int i = offset; i < ((length == -1) ? data.Length : length); i++) VideoMemory[i] = SystemMemory[i - offset]; // simulate what GL.BufferData would do } } // lock the buffer: GL.DeleteSync(syncObj); syncObj = GL.FenceSync(SyncCondition.SyncGpuCommandsComplete, WaitSyncFlags.None); I'm thinking that it might be faster to invoke some native code here to read the data directly into the buffer for instance, by fread-ing to it. Would that perhaps be faster than the way I'm currently forced to do it? It feels like just looping over the array is the worst way to do it.
  23. hgoel0974

    Writing to Persistent Mapped Buffers

    I guess I should have researched this further. I ended up using Marshal.Copy for the version that works on a float[] but for the uint[] version (because for some reason .NET doesn't have an unsigned overload for Marshal.Copy) I can't figure out what to go with, I thought of using RtlMoveMemory via P/Invoke but that'd be Windows specific which is something I'd like to avoid, I did find Buffer.MemoryCopy which would be what I'm looking for but even though I'm using .NET 4.5 it doesn't appear to be available.
  24. Although this isn't particularly specific to OpenGL, it's in the context of my game engine which currently only deals with OpenGL so...   I was thinking about an approach to the rendering system, similar to how I think Command Buffers in the upcoming APIs are supposed to work. My game engine has a multithreaded game loop where each thread deals with specific functions (like one for updating physics, another for animations, another for general updates and another for world management, all loops are fixed timestep), now the problem is, being in C# and OpenTK, the rendering is tied to the WinForm Paint function, which sort of bugs me, I'd like to be able to make every thing totally independent. So I came up with the following idea:   Have a Render Queue where a separate Render thread writes its render commands to. The commands on the render queue (up till swapbuffers) are then executed on every refresh. If the system is having trouble keeping up and a frame is missed, then the commands relevant to it are dropped from the Queue. With this setup I think it should take out a lot of the complexity of dealing with variable rendering time steps while also making it easy to transition to Vulkan and DX12 when they're released.   But just before I go in and start adjusting the API to work like this, I wanted to know if there were any immediately obvious flaws with such a system that I'm overlooking.
  25. Yeah, that's how I'm doing it.
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!