Jump to content

  • Log In with Google      Sign In   
  • Create Account

Matias Goldberg

Member Since 02 Jul 2006
Offline Last Active Yesterday, 05:21 PM

Posts I've Made

In Topic: glsl represent 1 big texture as 4 smaller ones (tearing)

22 June 2016 - 05:14 PM

You're gonna have trouble with bilinear (gets worse with trilinear) filtering at the edges because the GPU should be interpolating between the two textures, but obviously this won't happen, so you need to do it yourself.

 

Potentially you may have to sample all four textures and interpolate it yourself:

// Assuming layout of textures:
// |0|1|
// |2|3|
result = mix(
mix( c0, c1, fract( uv.x * 1024.0 - 0.5/1024.0 ),
mix( c2, c3, fract( uv.x * 1024.0 - 0.5/1024.0 ),
fract( uv.y * 1024.0 - 0.5/1024.0 ) );

If you're at the left/right edge, you only need c0 & c1 or c2 & c3; if you're at the top/bottom edge you only need c0 & c2 or c1 & c3. But if you're close to the cross intersection, you're going to need to sample and mix all 4 textures.

 

Also the mipmaps need to be generated offline based on the original 1024x1024 rather than generating them on the GPU since it will generate them based on the 512x512 blocks individually.

 

I can't think quickly of a way to fix the trilinear filtering problem though.


In Topic: Single producer - multiple consumer fast queue?

21 June 2016 - 10:56 AM

How much memory do you need to copy? Because you would have to literally be copying several GB worth of data per second to show up as a problem. Unless this is the case, it sounds to me you're hitting false cache sharing issues while doing the copy, which would slowdown all cores considerably.

In Topic: How to get patch id in domain shader.

19 June 2016 - 11:43 AM

 

Also, drawing each path in its own DrawCall sounds incredibly inefficient. You need to provide at least 256 vertices per draw call to fully utilize the vertex shader.

I thought it was 64 vertices to fully utilize the vertex shader and 256 to not become command processor limited.
 
edit - for amd.

 

AMD's wavefront size is of 64, that's true, but there are some inefficiencies and overhead details, such as needing 3 vertices to make a triangle (e.g. 64 triangles x 3 = 192 vertices assuming no tri shares any vertex). Real world testing shows on average you get near optimum throughput at >= 256 vertices per draw.
Edit. See http://www.g-truc.net/post-0666.html
 

@Matias is it still true if I have a pass-through vertex shader?

Yep.


In Topic: How to get patch id in domain shader.

18 June 2016 - 05:01 PM

Also, drawing each path in its own DrawCall sounds incredibly inefficient. You need to provide at least 256 vertices per draw call to fully utilize the vertex shader.

In Topic: Batching draws - keeping buffers up to date

15 June 2016 - 10:17 AM

While Tangletail's link is an excellent read, I'm afraid your problem is that you're using WbGL and JavaScript.
There's a lot of optimizations... that are not possible in WebGL, and what you're doing shifted the work from DrawCall overhead to filling buffers from CPU using JavaScript.
Even if you got C's or Asm speed for filling those buffers, you're still going to hit RAM or bus bandwidth problems which are avoidable in modern APIs.

The best thing I can recommend is to setup geometry instanced buffers (aka software or manual instancing) and use vertexCount (or indexCount) to control how many instances you render (ie indexCount * numInstances), passing individual transform data in a uniform array.
Another solution is to use ANGLE_instanced_arrays extension when available.

PARTNERS