Fragment shader variables count (iPhone4)

Started by
4 comments, last by L. Spiro 9 years, 11 months ago

I have high number of variables (30 uniforms (mostly vec4), about 20 variables (vec3, float, vec4) within shader) within fragment shader. It runs just fine on iPhone5S, but I have serious problem on iPhone4. GPU time is 1s / frame and 98% of the time is shader run time.

According to Apple API

OpenGL ES limits the number of each variable type you can use in a vertex or fragment shader. The OpenGL ES specification doesn’t require implementations to provide a software fallback when these limits are exceeded; instead, the shader simply fails to compile or link. When developing your app you must ensure that no errors occur during shader compilation, as shown in Listing 10-1.

But from this I quite dont understand. Do they provide SW fallback or not? Because I have no errors during compilation or linking of shader and yet performance is poor. I have comment almost everything out and just leave 2 texture lookups and directional light computation. I changed other functions to return just vec4(0,0,0,0).

Advertisement

But from this I quite dont understand. Do they provide SW fallback or not? Because I have no errors during compilation or linking of shader and yet performance is poor. I have comment almost everything out and just leave 2 texture lookups and directional light computation. I changed other functions to return just vec4(0,0,0,0).

I don’t believe they have a software fallback There is no software fallback on any iOS device, and your case essentially proves this isn’t the issue anyway.
The compiler will strip unused uniforms entirely; only uniforms that contribute to the output remain in any shaders. Even if you do access them, if they still don’t contribute to the output they are eliminated (as are local variables).

If you have removed enough to return hard-coded black mixed with a few texture reads, you have undoubtedly eliminated substantially enough uniforms and locals that you are nowhere near any device limits.
Your bottleneck should be elsewhere.


You should post the shaders before and after you reduce its complexity though to be sure this is the case.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

I have about 30 shaders with high complexity (about 30+ uniforms for each of them). My render loop currently uses only one shader, that I reduced to bare minimum (you are right, uniforms are stripped away during build)

My current shader looks like this:



vec4 CalcAmbientLight(Material mat, float fIntensity)
{
    return mat.vAmbient * en_vAmbLightColor * fIntensity;
}

uniform sampler2D normal_buffer;
uniform sampler2D bg_buffer;

uniform vec2 canvasSize;

uniform SpotLight en_spotLight[6];

varying vec2 vTexCoord;

void main()
{			

  vec4 normalMap = texture2D(normal_buffer, vTexCoord); //normal_buffer is RGBA (GL_RGBA)
  vec3 bgColor = texture2D(bg_buffer, vTexCoord).rgb; //bg_buffer is RGB (GL_RGB)

    
  vec3 vNormal = normalize((2.0 * normalMap.rbg) - 1.0);

  vec3 posWS = vec3(vTexCoord.x * canvasSize.x, normalMap.a, vTexCoord.y * canvasSize.y);


  //----------------------------------------------------------------------------------
  Material mat;
  mat.vAmbient = vec4(bgColor, 1.0);
  mat.vDiffuse = vec4(bgColor, 1.0);
  mat.vSpecular = vec4(1.0);
  mat.fShiness = 30.0;
 

    vec4 vPhong = CalcAmbientLight(mat, 0.2);
    
  vPhong.a = 1.0;
   gl_FragColor = vPhong;
}

Vertex shader is simple pass-through. I am rendering fullscreen quad at resolution of the iPhone4 screen.

With this piece of "nothing" code, I have 45fps (I know it is debug mode, but still it is way to low from 60fps). Same code has 60fps on iPhone5S. CPU is at 4.4ms / frame. GPU takes 20.1ms

Here is also my sequnce of calls from XCode frame trace


//use render to texture for the whole scene
#0 glBindFramebuffer(GL_FRAMEBUFFER, 7)
#1 glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, 6, 0)
#2 GL_FRAMEBUFFER_COMPLETE <- glCheckFramebufferStatus(GL_FRAMEBUFFER)
#3 glBindRenderbuffer(GL_RENDERBUFFER, 8)
#4 glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_RENDERBUFFER, 8)
#5 GL_FRAMEBUFFER_COMPLETE <- glCheckFramebufferStatus(GL_FRAMEBUFFER)
#6 glUseProgram(17)
#7 glBindTexture(GL_TEXTURE_2D, 3)
#8 glBindVertexArray(1)
#9 glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 44)
#10 glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, nullptr)
#11 glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0)
#12 glBindVertexArray(0)
//----------------------------- End of main rendering --------
//now render texture to screen
#13 glBindFramebuffer(GL_FRAMEBUFFER, 1)
#14 glBindRenderbuffer(GL_RENDERBUFFER, 2)
#15 glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER, 2)
#16 GL_FRAMEBUFFER_COMPLETE <- glCheckFramebufferStatus(GL_FRAMEBUFFER)
#17 glBindRenderbuffer(GL_RENDERBUFFER, 1)
#18 glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_RENDERBUFFER, 1)
#19 GL_FRAMEBUFFER_COMPLETE <- glCheckFramebufferStatus(GL_FRAMEBUFFER)
#20 glUseProgram(25)
#21 glBindTexture(GL_TEXTURE_2D, 6)
#22 glBindVertexArray(2)
#23 glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 4)
#24 glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, nullptr)
#25 glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0)
#26 glBindVertexArray(0)
#27 glDiscardFramebufferEXT(GL_FRAMEBUFFER, 1, {GL_DEPTH_ATTACHMENT})
#28 glBindRenderbuffer(GL_RENDERBUFFER, 2)
#29 ["Context 1" presentRenderbuffer:GL_RENDERBUFFER]


45 fps when rendering at full resolution on an iPhone4 might well be normal, even with a shader that simple. An iPhone4 GPU is only slightly superior to a 3GS GPU, but it has 4 times the number of pixels to fill, so it really struggles with fill rate.

Remember that vsync is always on, so your reported 45fps might in reality be 59fps getting rounded down due to vsync.

Also, adding logical buffer discard and clear commands can speed things up. At the start of your frame the first thing that OpenGL is doing there is copying the previous frames framebuffers onto the new framebuffer, because the driver has no way of telling that you don't want them.

edit: Oops, just spotted the discard command. I think a glClear would still help though probably.

But from this I quite dont understand. Do they provide SW fallback or not? Because I have no errors during compilation or linking of shader and yet performance is poor. I have comment almost everything out and just leave 2 texture lookups and directional light computation. I changed other functions to return just vec4(0,0,0,0).

I don’t believe they have a software fallback, and your case essentially proves this isn’t the issue anyway (at least at face value for now).
The compiler will strip unused uniforms entirely; only uniforms that contribute to the output remain in any shaders. Even if you do access them, if they still don’t contribute to the output they are eliminated (as are local variables).

If you have removed enough to return hard-coded black mixed with a few texture reads, you have undoubtedly eliminated substantially enough uniforms and locals that you are nowhere near any device limits.
Your bottleneck should be elsewhere.


You should post the shaders before and after you reduce its complexity though to be sure this is the case.


L. Spiro

There definelty is software fallback. I once hit that when I tried to support too many lights at vertex shader and peformance dropped to >400ms per frame. Then I reduced one light and frame time dropped to 10ms. Profiler clearly indicated that most of the time was spent at software shader pipeline.

With this piece of "nothing" code, I have 45fps (I know it is debug mode, but still it is way to low from 60fps).

That is correct for an iPhone 4 whose OpenGL ES view has a contentScaleFactor of 2.0.
At work we allow game teams to set a preferred contentScaleFactor on supported devices, except for iPhone 4, which we force to 1.0.
In other words, your numbers are perfectly reasonable and you simply need to disable retina mode.


There definelty is software fallback. I once hit that when I tried to support too many lights at vertex shader and peformance dropped to >400ms per frame. Then I reduced one light and frame time dropped to 10ms. Profiler clearly indicated that most of the time was spent at software shader pipeline.

There is no software fallback under any circumstances for any iOS device released to date (nor will there likely ever be).
Your slowdown could be because of your lighting equations or because the complexity of your shader increased significantly with more lights and was harder to optimize, etc., but it was not because of software shaders.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

This topic is closed to new replies.

Advertisement