Dukus

Members
  • Content count

    7
  • Joined

  • Last visited

Community Reputation

241 Neutral

About Dukus

  • Rank
    Newbie

Personal Information

  • Interests
    |programmer|
  1. If you need to break the rendering up into more jobs here's a few ideas.   For culling, if you have a spacial subdivision (quadtree/octree) you can create a job per node of a certain size, say anything over N meters becomes a separate job.   For sorting, if you know certain things have certain properties and rendering order, such as opaque, transparent, z sorted, etc, you can bin them together at cull time and then sort them separately as their own job.   For drawing, you can certainly take the sorted lists and break them up into N jobs. I'd just make sure that N is big enough that you don't take a GPU hit from having to set all render state at the beginning of each sub list, since you don't know how render state will end up between each small command buffer that results. 
  2. I'd just break up the rendering into jobs per render target - generally you'll have the main scene and N shadow maps, maybe a reflection, maybe a cube map to render. Each one of these has its own Cull, Sort, Draw that is fairly independent. Each can be its own job in the thread system and they just need to be linked together at the end so all the dependent shadow maps / textures are available at the right time.   If they need to be linked throughout the main scene render (i.e. shadow map reuse), you just do some up front work to know what the command buffers / render target textures are so you can reference them as needed before they are fully generated and without syncing.
  3. It is many years ago. But there are people that play games and don't upgrade but every 10 years. There are still dx9 level cards, (or just dx9 systems) and while shader model 3.0 can do vertex fetch, its generally slow under the wrong conditions. There's also a lot of intel cards with integrated GPUs that aren't all that fast but can still push triangles if you use simple shaders and few textures. Really I think it depends on the target market for a game. As an indie dev, If I don't absolutely need the GPU performance/feature, I don't see a reason to limit the market. But thats a bit off topic for this thread.   As for normals on terrain I like to use straight unique texture maps. Just put the worldspace normal in textures at high resolution and you're done - either precomputed and streamed or generated at runtime and cached. This makes the terrain appear to have perfect detail in the distance. Or if you're tiling the normal map, a tangent basis can be generated using the low res mesh at whatever LOD. You'd have to interpolate it at LOD edges to make sure you don't get severe lighting pops, but you'd be doing that with position anyway so it should be easy.
  4. I personally like terrain on the CPU, just sending vertex buffers to the GPU as needed. Mostly for backwards compatiblilty, old cards that can't do tessellation or even vertex fetch. And I don't like having multiple render paths. Another issue is gigantic terrain. If the entire height map doesn't fit in a texture, you're loading chunks from disk or generating them procedurally usually at the LOD required, not high resolution. You can make this work for GPU tesselation or texture fetch, but it requires an additional index/offset/scale when fetching to get the right chunk at the right LOD. In this case I think CPU is easier, and wouldn't Go the GPU route unless performance demanded it. I'd hazard a guess that drawing buildings, trees, and grass is the slow thing and greatly outweighs the draw calls of CPU terrain and I'd optimize accordingly. Another thing I like about CPU terrain is simple occlusion culling. By rendering front to back, I keep track of the 'horizon' and throw away chunks of terrain below it. This saves a lot if fill rate and vertex processing.
  5. DrawCall Batching

    In my engine, I use the object pointer, sub mesh index, and then a special 'instanceId' which is usually zero. But in the case of skeletal animation, the instanceId becomes the id of the animation playing in my instanced animation manager. This makes the drawing object unique during sorting. And when the group goes to draw, the bone constants are set once, (I use a call back to the mesh instance to get these). As like any other object, a transform for each object is stuffed into a constant buffer, and the draw occurs.    I batch my instances every frame. My thought being that the camera is always moving in a game, or if its not, objects are coming into and out of the frame all the time so the best fit instance list almost always changes. Last time I profiled figuring out batches was super super low amount of time compared to culling, sorting draw items and generating draw commands.
  6. You'll probably also want to think of overall design goals. For example, do you want lowest latency from input to display on screen, or maximum throughput but with several frame delay from input to display?   In the low latency case, you do game updates with N threads, physics collision with N threads, and rendering with N threads, one after the other. Or, you can run game updates, physics, and rendering all at the same time, but you need to store state per thread so you don't have to synchronize except for the state at the end of the frame.    I'd recommend using knowledge about what the game engine will do to make a more specific task system, than something overly generic.    As for OpenGL, you have to make all your draw calls serially from a single thread (you can use multiple, but you'd have to synchronize and switch contexts). You might look into Vulcan/DX12 (or DX11) so that you can parallelize building command buffers.
  7. Sounds like you are projecting triangles that are behind the camera. Are you clipping them at the near plane? The slow down could be because the reverse projection is generating a lot of triangles that turn fullscreen. And or invalid floating point numbers due to divide by zero?