Jump to content
  • Advertisement


Popular Content

Showing content with the highest reputation on 09/25/18 in all areas

  1. 1 point
    If you are interested in learning the free cross platform Lua powered Defold game engine, this tutorial series is a perfect place to start. It covers all aspects of working with Defold, starting from creating your first project, to sprites, animation, input, music, physics, particles, scene loading, GUIs and much more. If you prefer video, there is one of those as well covering the same material as the text tutorial. If you've never heard of Defold, its a cross platform (LInux, Windows, Mac), Lua powered mobile friendly 2D game engine, made available by King (Candy Crush people). It's free to download and use.
  2. 1 point
    The past few days I have been playing with Godot engine with a view to using it for some Gamedev challenges. Last time for Tower Defence I used Unity, but updating it has broken my version so I am going to try a different 'rapid development' engine. So far I've been very impressed by Godot, it installs down to a small size, isn't bloated and slow with a million files, and is very easy to change versions of the engine (I compiled it from source to have latest version as an option). Unfortunately, after working through a couple of small tutorials I get the impression that Godot suffers from the same frame judder problem I had to deal with in Unity. Let me explain: (tipping a hat to Glenn Fiedler's article) Some of the first games ran on fixed hardware so they didn't have a big deal about timing, each 'tick' of the game was a frame that was rendered to the screen. If the screen rendered at 30fps, the game ran at 30fps for everyone. This was used on PCs for a bit, but the problem was that some hardware was faster than others, and there were some games that ran too fast or too slow depending on the PC. Clearly something had to be done to enable the games to deal with different speed CPUs and refresh rates. Delta Time? The obvious answer was to sample a timer at the beginning of each frame, and use the difference (delta) in time between the current frame and the previous to decide how far to step the simulation. This is great except that things like physics can produce different results when you give it shorter and longer timesteps, for instance a long pause while jumping due to a hard disk whirring could give enough time for your player to jump into orbit. Physics (and other logic) tends to work best and be simpler when given fixed regular intervals. Fixed intervals also makes it far easier to get deterministic behaviour, which can be critical in some scenarios (lockstep multiplayer games, recorded gameplay etc). Fixed Timestep If you know you want your gameplay to have a 'tick' every 100 milliseconds, you can calculate how many ticks you want to have complete at the start of any frame. // some globals iCurrentTick = 0 void Update() { // Assuming our timer starts at 0 on level load: // (ideally you would use a higher resolution than milliseconds, and watch for overflow) iMS = gettime(); // ticks required since start of game iTicksRequired = iMS / 100; // number of ticks that are needed this frame iTicksRequired -= iCurrentTick; // do each gameplay / physics tick for (int n=0; n<iTicksRequired; n++) { TickUpdate(); iCurrentTick++; } // finally, the frame update FrameUpdate(); } Brilliant! Now we have a constant tick rate, and it deals with different frame rates. Providing the tick rate is high enough (say 60fps), the positions when rendered look kind of smooth. This, ladies and gentlemen, is about as far as Unity and Godot typically get. The Problem However, there is a problem. The problem can be illustrated by taking the tick rate down to something that could be considered 'ridiculous', like 10 or less ticks per second. The problem is, that frames don't coincide exactly with ticks. At a low tick rate, several frames will be rendered with dynamic objects in the same position before they 'jump' to the next tick position. The same thing happens at high tick rates. If the tick does not exactly match the frame rate, you will get some frames that have 1 tick, some with 0 ticks, some with 2. This appears as a 'jitter' effect. You know something is wrong, but you can't put your finger on it. Semi-Fixed Timestep Some games attempt to fix this by running as many fixed timesteps as possible within a frame, then a smaller timestep to make up the difference to the delta time. However this brings with it many of the same problems we were trying to avoid by using fixed timestep (lack of deterministic behaviour especially). Interpolation The established solution that is commonly used to deal with both these extremes is to interpolate, usually between the current and previous values for position, rotation etc. Here is some code: // ticks required since start of game iTicksRequired = iMS / 100; // remainder iMSLeftOver = iMS % 100; // ... gameplay ticks // finally, the frame update float fInterpolationFraction = iMSLeftOver / 100.0f; FrameUpdate(fInterpolationFraction); ..... // very pseudocodey, just an example for translate for one object void FrameUpdate(float fInterpolationFraction) { // where pos is Vector3 translate m_Pos_render = m_Pos_previous + ((m_Pos_current - m_Pos_previous) * fInterpolationFraction); } The more astute among you will notice that if we interpolate between the previous and current positions, we are actually interpolating *back in time*. We are in fact going back by exactly 1 tick. This results in a smooth movement between positions, at a cost of a 1 tick delay. This delay is unacceptable! You may be thinking. However the chances are that many of the games you have played have had this delay, and you have not noticed. In practice, fast twitch games can set their tick rate higher to be more responsive. Games where this isn't so important (e.g. RTS games) can reduce processing by dropping tick rate. My Tower Defence game runs at 10 ticks per second, for instance, and many networked multiplayer games will have low update rates and rely on interpolation and extrapolation. I should also mention that some games attempt to deal with the 'fraction' by extrapolating into the future rather than interpolation back a frame. However, this can bring in new sets of problems, such as lerping into colliding situations, and snapping. Multiple Tick Rates Something which doesn't get mentioned much is that you can extend this concept, and have different tick rates for different systems. You could for example, run your physics at 30tps (ticks per second), and your AI at 10tps (an exact multiple for simplicity). Or use tps to scale down processing for far away objects. How do I retrofit frame interpolation to an engine that does not support it fully? With care is the answer unfortunately. There appears to be some support for interpolation in Unity for rigid bodies (Rigidbody.interpolation) so this is definitely worth investigating if you can get it to work, I ended up having to support it manually (ref 7) (if you are not using internal physics, the internal mechanism may not be an option). Many people have had issues with dealing with jitter in Godot and I am as yet not aware of support for interpolation in 3.0 / 3.1, although there is some hope of allowing interpolation from Bullet physics engine in the future. One option for engine devs is to leave interpolation to the physics engine. This would seem to make a lot of sense (avoiding duplication of data, global mechanism), however there are many circumstances where you may not wish to use physics, but still use interpolation (short of making everything a kinematic body). It would be nice to have internal support of some kind, but if this is not available, to support this correctly, you should explicitly separate the following: transform CURRENT (tick) transform PREVIOUS (tick) transform RENDER (where to render this frame) The transform depends on the engine and object but it will be typically be things like translate, rotate and scale which would need interpolation. All these should be accessible from the game code, as they all may be required, particularly 1 and 3. 1 would be used for most gameplay code, and 3 is useful for frame operations like following a player with a camera. The problem that exists today in some engines is that in some situations you may wish to manually move a node (for interpolation) and this in turn throws the physics off etc, so you have to be very careful shoehorning these techniques in. Delta smoothing One final point to totally throw you. Consider that typically we have been relying on a delta (difference) in time that is measured from the start of one frame (as seen by the app) and the start of the next frame (as seen by the app). However, in modern systems, the frame is not actually rendered between these two points. The commands are typically issued to a graphics API but may not be actually rendered until some time later (consider the case of triple buffering). As such the delta we measure is not actually the time difference between the 2 rendered frames, it is the delta between the 2 submitted frames. A dropped frame may for instance have very little difference in the delta for the submitted frames, but have double the delta between the rendered frames. This is somewhat a 'chicken and the egg' problem. We need to know how long the frame will take to render in order to decide what to render, where, but in order to know how long the frame will take to render, we need to decide what to render, and where!! On top of this, a dropped frame 2 frames ago could cause an artificially high delta in later submitted frames if they are capped to vsync! Luckily in most cases the solution is to stay well within performance bounds and keep a steady frame rate at the vsync cap. But in any situation where we are on the border between getting dropped frames (perhaps a high refresh monitor?) it becomes a potential problem. There are various strategies for trying to deal with this, for instance by smoothing delta times, or working with multiples of the vsync interval, and I would encourage further reading on this subject (ref 3). References 1 https://gafferongames.com/post/fix_your_timestep/ 2 http://www.kinematicsoup.com/news/2016/8/9/rrypp5tkubynjwxhxjzd42s3o034o8 3 http://frankforce.com/?p=2636 4 http://fabiensanglard.net/timer_and_framerate/index.php 5 http://lspiroengine.com/?p=378 6 http://www.koonsolo.com/news/dewitters-gameloop/ 7 https://forum.unity.com/threads/case-974438-rigidbody-interpolation-is-broken-in-2017-2.507002/
  3. 1 point
    Both Unity and Godot have fixed update and frame update. But after a bit of investigation, I couldn't get the interpolation working properly in unity for my game, so implemented it manually. Admittedly I am not very experienced with Unity, and they do seem aware of the issue (maybe some of the articles I read were out of date?). I also wasn't using their internal physics so perhaps that influenced me, maybe their interpolation only works for internal physics. I will correct this in the post though, well spotted, thank you. Unfortunately I can no longer test as I don't have Unity working any more. In my tests so far in Godot and reading the issues on github it doesn't appear like they have an in built solution for this yet, there are several topics in the issue tracker about it (and attempts to fix some issues with a hysteresis modification). One guy has suggested that Bullet which is available in Godot does offer interpolation and perhaps Godot can make the right calls to Bullet and get the interpolation for free for rigid bodies etc.
  4. 1 point
    I think you sound a bit like Boris Karloff...lol...nice deep voice for a spooky intro if you ask me..😎
  5. 1 point
    Unity does fixed timestep physics (and has a fixed and a variable rate loop for your own code), let's you configure the step sizes, and let's you choose whether to interpolate per physics object....
  6. 1 point
    I was wondering how to do this and now I know. Great blog entry.
  7. 1 point
    Last week was a modelling one. There aren't a whole lot of new mechanics but it was still a productive week nevertheless. Custom Font Firstly, I've previously talked about creating a custom font to display some of the GUI icons. Well, with the use of FontForge, we were able to add vectorial art in a TrueType font. For those who don't know, FontForge is an open source font-making app. It's pretty advance and can actually make good looking fonts. There's a pretty acute learning curve though. If you're using it on Windows, you'll need to fiddle around with the setting though. Otherwise, it can really run with a whole lot of hiccups and crashes. With FontForge, you can actually import vectorial graphics right into the font. Because I've already had SVG files of most of my used icons, it was just a matter of a couple of clicks and everything was imported. However, it wasn't a piece of cake: although imported, we still need to properly align those graphics up so they could be properly displayed. With FontForge you can export the custom font to different file formats. Thankfully, we can export in TrueType, which is exactly the type of font file Unity uses. Once the TrueType file is created, we can then use Unity's dynamic font rendering to display our GUI icons at any resolution we need without rescaling and rerendering any texture. However, there's a big drawback: fonts are always monochromatic. If we want a coloured icon then we'll have no other option besides using a traditional bitmap texture. (Colour fonts do exits... However, their support isn't really widespread) But anyway, here's how it looks: New rooms Secondly, there are also two new rooms. All of these rooms are linked to crystals. Now that the player can know the number of crystals they currently have, those rooms can safely be integrated and tested without any hiccups. The Temple When visiting a temple, players can donate their crystals to gain back health points. To do this, the player simply needs to interact with the box at the back of the room while holding a crystal. Temples are modelled after Japanese Shinto temple/shrine. I've taken some liberties here and there but the overall theme is Japan. They are also much more open compared to other rooms. When the sun is right, the lighting can be quite charming. The Pawnshop The pawnshop isn't finished yet, but it's functional nevertheless. The player can exchange their crystals for a small amount of money by interacting with the yet-to-be-modelled cash register. Once finished, the pawnshop will have some fancy props here and there much like a typical pawnshop: things like guitars, jewellery and, of course, crystals. But for now, the room is kinda bland and boring... Minor updates There are also some new models and code refactors. For once, the diner now has a nice sign up on its roof: Aside from that, there aren't a whole lot of minor differences. Next week Like I've stated before, a lot of rooms can be added into the game now that most gameplay mechanics are coded. And there's still a whole lot of rooms to implement. Of course, I still need to model out the pawnshop and add its details. There might be some polishing to do with many gameplay mechanics and maybe a refactor or two. There's a lot of work ahead of me.
  8. 1 point
    The problem You’re building a game-world that is big, so big in a fact that not all of it can be loaded into memory at once. You also don’t want to introduce portals or level loading. You want the player to have an uninterrupted experience. For true continuous streaming, a typical scenario would be something like this: The world is partitioned into tiles (Quad-tree) When the Camera moves, tile-data is read from disk and pre-processed in the background. We need to render meshes for each tile. There can be more than 1000 tiles in the AOI, more than 100 different meshes and up to 10000 instances per mesh on one tile. How to improve from worst-case 1000000000 draw calls to best-case 1 draw call? Introduction To focus on the render-data preparation specifically, I assume the reader is familiar with the following concepts: Instanced mesh rendering Compute Shaders AOI (Area Of Interest) Quad-tree tile-based space partitioning For an introduction I recommend this BLOG entry on our website:http://militaryoperationshq.com/dev-blog-a-global-scope/ I will use OpenGL to demonstrate details because we use it ourselves and because it is the platform independent alternative. The technique however can be adapted for any modern graphics API that supports compute shaders. The solution The solution is to do the work on the GPU. This is the type of processing a GPU is particularly good at. The diagrams below show memory layout. Each colour represents a different type of instance data, stored non-interleaved. For example, position, texture-array layer-index or mesh scale-factor etc. Within each instance-data-type (colour) range, a sub-range (grey) will be used for storing data for instances of a particular mesh. In this example, there are 4 different meshes that can be instanced. Within the sub-range, there is room to store instance-data for “budget” amount of instances. After loop-stage step 4, we know exactly where to store instance data of each type (pos, tex-index, scale, etc.) for a particular mesh-type. In this example, the scene contains no mesh-type 2 instances and many mesh-type 3 instances. Prepare once at start-up Load all mesh data of the models you want to be able to show in one buffer. Prepare GL state by creating a Vertex Array Object containing all bindings. Create a command-buffer containing Indirect-Structures, one structure for each mesh that you want to be able to render. Fill the Indirect-Structure members that point to (non-instance) mesh vertex data. Steps for one new tile entering the AOI Read geometry from disk Rasterize geometry into a material-map Generate instance-points covering the tile. Select a grid-density and randomise points inside their grid-cell to make it look natural if you’re doing procedural instancing. Whole papers have been written about this topic alone. Sample from the material-map at the grid-point to cull points and decorate data. Store the result in a buffer per tile. Keep the result-buffer of a tile for as long as it is in the AOI Step 1, 2, 3 and 4 may well be replaced by simply loading points from disk if they are pre-calculated offline. In our case we cover the entire planet, so we need to store land-use data in vector form and convert it into raster data online, to keep the install size manageable. Steps for each loop This is where things get interesting. Do frustum and other culling of the tiles so you know what tiles are visible and contain meshes that need rendering. Clear instance-count and base-instance fields of indirect-structures in the command buffer. Run a simple compute shader for this. If you would map the buffer or use glBufferData to allow access from the CPU, you introduce an expensive upload and synchronisation which we want to prevent. Run a compute shader over the tile-set in view to determine which meshes to render. Just count instances per mesh in the instance-count member of the Indirect_structure. This may require sampling from the material map again or doing other calculations to pick a mesh LOD or reflect game-state. It may very well require procedural math to “randomly” spawn meshes. This all depends on your particular game requirements. Fill-in the base-instance member of the Indirect-Structures by launching a single compute shader instance. Run a compute shader to prepare render data. Do the calculations that determine what mesh to select, again. Claim a slot in the vertex-attributes buffer and store render data. since at this point we already know exactly how many instances of each mesh will need rendering (all counts and offsets), we know in what range a particular mesh instance needs to store instance data. The order within the range for a particular mesh doesn’t matter. The important, maybe counterintuitive thing here is, that we do all calculation to determine what mesh to instance, twice. We don’t store the results from the first time. It would be complicated, memory consuming and slow to remember what mesh instance of what tile ends-up at what vertex-data location in the render buffer, just so we can look up an earlier stored result. It may feel wasteful to do the same work twice, but that is procedural thinking. On the GPU it is often faster to recalculate something then to store and read back an earlier result. Now everything is done on the GPU and we only need to add some memory-barriers to make sure data is actually committed before a next step is using it. Atomic operations Note that step 3 and 5 of the loop-stage require the compute shader to use atomic operations. They are guaranteed to not conflict with other shader-instances when writing to the same memory location. Instance budget You need to select a budget for the maximum number of meshes that can be drawn at once. It defines the size of the instance-data buffer. This budget may not cover certain extreme situations. This means we need to make sure we do not exceed the budget. Step 4 updates the base-instance of the indirect-structure. At that point, we can detect if we exceed the budget. We can simply force instance-counts to zero when we reach the budget. But this will have the effect of potentially very visible mesh instances to be in or excluded from the render-set each loop. To solve this, sort the indirect-structures, representing meshes, in the command-buffer from high to low detail. This is only needed once at start-up. That way the first meshes that will be dropped are low LOD and should have the least impact. If you’re using tessellation to handle LOD, you’ll have to solve this differently or make sure your budget can handle the extreme cases. Ready to render We now have one buffer containing all render data needed to render all instances of all meshes on all tiles, in one call. We simply do a single Render call using the buffer that contains the indirect-structures. In fact, we render all possible meshes. If for the current situation, some meshes do not need to be rendered, their instance-count in the indirect-structure will be zero and it will be skipped with very little to no overhead. How it used to be In a traditional scenario, we may have filled an indirect-structure buffer with only structures for the meshes that need rendering. Then copying all render data in a vertex-attribute buffer, in an order to match the order of indirect-structures in the command buffer. Which means we need a sorting step. Next, an upload of this render data to the GPU is required. Since the upload is slow, we will probably need to double or, even better, triple buffer this to hide transfer and driver stages so we don’t end up waiting before we can access/render a buffer. Summary Key points Preprocess and store intermediate data on the GPU Make mesh instance render order fixed and render all meshes always Use a 2 pass approach, first count instances so we can predict memory layout the second time. Benefits No upload of render data each render-loop No need to refill/reorder the command-buffer (indirect-structures) every loop No sort needed to pack vertex-data for mesh instances No need for double/triple buffering Improvements The most performance is gained by selecting the best design for your software implementation. Nevertheless, there is some low hanging fruit to be picked before going into proper profiling and tackling bottlenecks. Memory allocation You should use glBufferStorage to allocate GPU memory for buffers. Since we never need to access GPU memory from the CPU, we can do this: glBufferStorage(GL_ARRAY_BUFFER, data_size, &data[0], 0); The last parameter tells the GL how we want to access the memory. In our case we simple pass 0 meaning we will only access it from the GPU. This can make a substantial difference depending on vendor, platform and driver. It allows the implementation to make performance critical assumptions when allocating GPU memory. Storing data This article describes vertex-data that is stored non-interleaved. We are in massively parallel country now, not OO. Without going into details, the memory access patterns make it more efficient if, for example, all positions of all vertices/instances are packed. The same goes for all other attributes. Frustum Culling In this example, tile frustum-culling is done on the CPU. A tile, however, may contain many mesh instances and it makes good sense to do frustum culling for those as well. It can be easily integrated into step 3 and performed on the GPU. Launching compute shaders The pseudo-code example shows how compute shaders are launched for the number of points on each tile. This means the number of points needs to be known on the CPU. Even though this is determined in the background, downloading this information requires an expensive transfer/sync. We can store this information on the GPU and use a glDispatchComputeIndirect call that reads the launch size from GPU memory. A trend This article shows how more work can be pushed to the GPU. We took this notion to the extreme by designing an engine from the ground up, that is completely running on the GPU. The CPU is only doing I/O, user interaction and starting GPU jobs. You can read more about our “Metis Tech” on our blog page: http://militaryoperationshq.com/blog/ The main benefits are the lack of large data up/downloads and profiting from the huge difference in processing power between GPU and CPU. At some point, this gap will become a limiting bottleneck. According to NVidia, the GPU is expected to be 1000x more powerful than the CPU by 2025! (https://www.nextplatform.com/2017/05/16/embiggening-bite-gpus-take-datacenter-compute/) Appendix A - Pseudo C++ code //! \brief Prepare render data to draw all static meshes in one draw call void prepare_instancing( const std::vector<int32_t>& p_tiles_in_view // Points per tile , const std::vector<GLuint> p_point_buffer , int32_t p_mesh_count , GLuint p_scratch_buffer , GLuint p_command_buffer , GLuint p_render_buffer , GLuint p_clear_shader , GLuint p_count_shader , GLuint p_update_shader , GLuint p_prepare_shader) { glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, p_command_buffer); glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 2, p_scratch_buffer); glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 3, p_render_buffer); // 2. Clear instance base and count glUseProgram(p_clear_shader); glDispatchCompute(p_mesh_count, 1, 1); glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT); // 3. Count instances per mesh glUseProgram(p_count_shader); for (int32_t l_tile_index = 0; l_tile_index < p_tiles_in_view.size(); ++l_tile_index) { glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, p_point_buffer[l_tile_index]); glDispatchCompute(p_tiles_in_view[l_tile_index], 1, 1); glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT); } // 4. Update instance base glUseProgram(p_update_shader); glDispatchCompute(1, 1, 1); glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT); // 5. Prepare render data glUseProgram(p_prepare_shader); for (int32_t l_tile_index = 0; l_tile_index < p_tiles_in_view.size(); ++l_tile_index) { glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, p_point_buffer[l_tile_index]); glDispatchCompute(p_tiles_in_view[l_tile_index], 1, 1); glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT); } glUseProgram(0); glBindBuffersBase(GL_SHADER_STORAGE_BUFFER, 0, 4, nullptr); } //! \brief Render all instances of all meshes on all tiles in one draw call void render_instanced( GLuint p_vao , GLuint p_command_buffer , GLuint p_render_shader , int32_t p_mesh_count) // Number of different meshes that can be shown { glBindVertexArray(p_vao); glUseProgram(p_render_shader); glBindBuffer(GL_DRAW_INDIRECT_BUFFER, p_command_buffer); glMultiDrawElementsIndirect(GL_TRIANGLES, GL_UNSIGNED_INT, 0, p_mesh_count, 0); glBindBuffer(GL_DRAW_INDIRECT_BUFFER, 0); glUseProgram(0); glBindVertexArray(0); } Appendix B – Compute-shader pseudo-code 2. Clear shader //**************************************************************************** //! \brief 2. p_clear_shader: Clear counts and offsets //**************************************************************************** // The local workgroup size layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in; // Input: Contains the indirects-structs for rendering the meshes layout(std430, binding=0) buffer p_command_buffer { uint p_indirect_structs[]; }; // IO: Containing uints for counting point-instance-data per mesh, and claiming slots layout(std430, binding=2) buffer p_scratch_buffer { uint p_instance_counts[]; // Globally for all tiles. Size = number of mesh variants }; void main() { uint l_invocation_id = gl_GlobalInvocationID.x; p_indirect_structs[l_invocation_id * 5 + 1] = 0; // 5 uints, second is the instance-count. p_instance_counts[l_invocation_id] = 0; } 3. Count shader //**************************************************************************** //! \brief 3. p_count_shader: Count instances //**************************************************************************** // The local workgroup size layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in; // Output: Contains the indirect-structs for rendering the meshes layout(std430, binding=0) buffer p_command_buffer { uint p_indirect_structs[]; // Globally for all tiles }; layout(std430, binding=1) buffer p_point_buffer { uint p_point_data[]; }; void main() { uint l_invocation_id = gl_GlobalInvocationID.x; //! \note What p_point_data contains is application specific. Probably at least a tile-local position. uint l_data = p_point_data[l_invocation_id]; //! \todo Use data in p_point_data to determine which mesh to render, if at all. uint l_mesh_index = 0; atomicAdd(p_indirect_structs[l_mesh_index], 1); // Count per instance } 4. Update shader //**************************************************************************** //! \brief 4. p_update_shader: Update instance base //**************************************************************************** // The local workgroup size layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in; // Input: Contains the indirect-structs for rendering the meshes layout(std430, binding=0) buffer p_command_buffer { uint p_indirect_structs[]; }; uniform uint g_indirect_struct_count = 0; uniform uint g_instance_budget = 0; void main() { uint l_invocation_id = gl_GlobalInvocationID.x; // This compute-shader should have been launched with 1 global instance! if (l_invocation_id > 0) { return; } // Update base-instance values in DrawElementsIndirectCommand int l_index, l_n = 0; p_indirect_structs[l_index * 5 + 4] = 0; // First entry is zero bool l_capacity_reached = false; for (l_index = 1; l_index < g_indirect_struct_count; ++l_index) { l_n = l_index – 1; // Index to the indirect-struct before uint l_base_instance = p_indirect_structs[l_n * 5 + 4] + p_indirect_structs[l_n * 5 + 1]; // If the budget is exceeded, set instance count to zero if (l_base_instance >= g_instance_budget) { p_indirect_structs[l_index * 5 + 1] = 0; p_indirect_structs[l_index * 5 + 4] = p_indirect_structs[l_n * 5 + 4]; } else { p_indirect_structs[l_index * 5 + 4] = l_base_instance; } } } 5. Prepare shader // The local workgroup size layout(local_size_x = 1, local_size_y = 1, local_size_z = 1) in; // Input: Contains the indirect-structs for rendering the meshes layout(std430, binding=0) buffer p_command_buffer { uint p_indirect_structs[]; }; // Input: Containing point data layout(std430, binding=1) buffer p_point_buffer { uint p_point_data[]; }; // IO: Containing mesh-counts for claiming slots layout(std430, binding=2) buffer p_scratch_buffer { uint p_instance_counts[]; // Globally for all tiles. Size = g_indirect_struct_count }; // Output: Containing render data layout(std430, binding=3) buffer p_render_buffer { uint p_render_data[]; }; uniform uint g_indirect_struct_count = 0; uniform uint g_instance_budget = 0; void main() { uint l_invocation_id = gl_GlobalInvocationID.x; uint l_data = p_point_data[l_invocation_id]; //! \todo Use data in p_point_data to determine which mesh to render, if at all. Again. uint l_mesh_index = 0; // This should never happen! if ( l_mesh_index >= g_IndirectStructCount) { return; } // Only process meshes that have an instance count > 0 if (p_indirect_structs[l_mesh_index * 5 + 1] == 0) { return; } // Reserve a spot to copy the instance data to uint l_slot_index = atomicAdd(p_instance_counts[l_mesh_index], 1); // From mesh-local to global instance-index l_slot_index += p_indirect_structs[l_mesh_index * 5 + 4]; // Make sure to not trigger rendering for more instances than there is budget for. if (l_slot_index >= g_instance_budget) { return; } //! \todo Write any data you prepare for rendering to p_render_data using l_slot_index } PDF format: Efficient instancing in a streaming scenario.pdf [Wayback Machine Archive]
  9. 1 point
    Consider worst case of roughly 12 vertices against 12 vertices. That’s 144 loops. However this will operate on memory inside of a very small space in L1 cache. Pretty much, it’s going to be extremely fast.
  10. -1 points
    I'm looking for a composer for a 3D adventure I am planning on developing here in the future. I have a composer as of now, but frankly he isn't that good, so when I saw this I decided to ask if you would be willing to help me if I gave some inspiration songs from other games. Thanks in Advance! Dalton
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!