# OpenGL ES Fixed Function Pipeline Faster For Sprites?

## Recommended Posts

Posted (edited)

I'm getting some strangely unexpected results with my new sprite renderer that uses OpenGL ES 2.0. It performs much worse than my old sprite renderer from 5 years ago that uses OpenGL ES 1.1 (no shaders). All I'm doing is displaying a grid of quads 16x16 and moving and zooming it around a little bit. You can see the difference in the video below:

Clearly, the fixed pipeline runs smoothly, but my supposedly fast one-draw-call shader program chugs (when I tried one draw call-per-quad it was naturally even slower). This is not what I expected.

• How can I speed up my new sprite renderer?
• Is the fixed function pipeline naturally just more adapted to vertex data that changes more often? (like a new VBO on every frame)
• I could just re-write the new renderer in OpenGL ES 1.1 again, but then I will lose compatibility with desktop OpenGL. This is a bad idea, right?
• Can I emulate the fixed-function pipeline with shaders? Is there code out there that does this? What tricks did they use in it to get sprites to render so fast?

Old Fixed-Function Code:

            for (int z = 0; z <= mTileEdit.mCurLevel; z++) {
for (int y = 0; y < tm.mSizeY; y++) {
for (int x = 0; x < tm.mSizeX; x++) {

int t = tm.get(x, y, z);

if (t != 0 && t > 0 && t < 256) {

// Set alpha
float alpha = 1.0f;
if (Lozoware.getMP().get("name").equals("pixeledit") || Lozoware.getMP().get("name").equals("edit3d")) {
alpha = 1.0f - ((float)z / (float)tm.mSizeZ);
}

// Set color
gl.glColor4f(tm.mPalette.mRed[t],
tm.mPalette.mGreen[t],
tm.mPalette.mBlue[t], alpha);

// Vertex buffer
bb = ByteBuffer.allocateDirect((6 * 3) * 3 * 4);
bb.order(ByteOrder.nativeOrder());
FloatBuffer buf = bb.asFloatBuffer();

float bottomLeftX = x * mGLTileSizeX;
float bottomLeftY = y * mGLTileSizeY;
float topLeftX = x * mGLTileSizeX;
float topLeftY = y * mGLTileSizeY + mGLTileSizeY;
float bottomRightX = x * mGLTileSizeX + mGLTileSizeX;
float bottomRightY = y * mGLTileSizeY;
float topRightX = x * mGLTileSizeX + mGLTileSizeX;
float topRightY = y * mGLTileSizeY + mGLTileSizeY;

buf.position(0);

buf.put(topLeftX);
buf.put(topLeftY);
buf.put(0);

buf.put(bottomRightX);
buf.put(bottomRightY);
buf.put(0);

buf.put(bottomLeftX);
buf.put(bottomLeftY);
buf.put(0);

buf.put(topLeftX);
buf.put(topLeftY);
buf.put(0);

buf.put(topRightX);
buf.put(topRightY);
buf.put(0);

buf.put(bottomRightX);
buf.put(bottomRightY);
buf.put(0);

buf.position(0);

// Draw
gl.glEnableClientState(GL10.GL_VERTEX_ARRAY);

gl.glVertexPointer(3, GL10.GL_FLOAT, 0, buf);

gl.glDrawArrays(GL10.GL_TRIANGLES, 0, 6 * 3);

gl.glDisableClientState(GL10.GL_VERTEX_ARRAY);
}
}
}
}

gl.glFlush();

New OpenGL ES 2.0 Code:

	int numVerts = 0;

// Alloc enough data for all sprites
for (const auto & pair: objects) {
Object * obj = pair.second;

if (obj != nullptr && obj - > visible && obj - > type == OBJTYPE_SPRITE) {
numVerts += 6;
}
}

int floatsPerVert = 26;

float * data = new float[numVerts * floatsPerVert];

int cursor = 0;

int q = 0;

// Fill data for all sprites
for (const auto & pair: objects) {
Object * obj = pair.second;

if (obj != nullptr && obj - > visible && obj - > type == OBJTYPE_SPRITE) {

if (texAtlas.getNeedsRefresh())
texAtlas.refresh();

// Set modelview matrix
glm::mat4 mvMatrix;
glm::mat4 scaleToNDC;
glm::mat4 cameraRotate;
glm::mat4 cameraTranslate;
glm::mat4 rotate;

#ifdef PLATFORM_OPENVR
scaleToNDC = glm::scale(glm::mat4(), glm::vec3(VRSCALE, VRSCALE, VRSCALE));#
else
scaleToNDC = glm::scale(glm::mat4(), glm::vec3(NDC_SCALE, NDC_SCALE, NDC_SCALE));#
endif

if (obj - > alwaysFacePlayer)
rotate = glm::rotate(glm::mat4(), glm::radians(-camera - > yaw), glm::vec3(0, 1, 0)) // Model yaw
*
glm::rotate(glm::mat4(), glm::radians(camera - > pitch), glm::vec3(1, 0, 0)); // Model pitch
else
rotate = glm::rotate(glm::mat4(), glm::radians(-obj - > yaw), glm::vec3(0, 1, 0)) // Model yaw
*
glm::rotate(glm::mat4(), glm::radians(-obj - > pitch), glm::vec3(1, 0, 0)); // Model pitch

cameraRotate = glm::rotate(glm::mat4(), glm::radians(camera - > roll), glm::vec3(0, 0, 1)) // Camera roll
*
glm::rotate(glm::mat4(), -glm::radians(camera - > pitch), glm::vec3(1, 0, 0)) // Camera pitch
*
glm::rotate(glm::mat4(), glm::radians(camera - > yaw), glm::vec3(0, 1, 0)); // Camera yaw

cameraTranslate = glm::translate(glm::mat4(), glm::vec3(-camera - > position.x, -camera - > position.y, -camera - > position.z)); // Camera translate

#ifdef PLATFORM_OPENVR
mvMatrix =
glm::make_mat4((const GLfloat * ) g_poseEyeMatrix.get()) *
scaleToNDC *
cameraRotate *
cameraTranslate *
glm::translate(glm::mat4(), glm::vec3(obj - > position.x, obj - > position.y, obj - > position.z)) // World translate
*
rotate *
glm::scale(glm::mat4(), obj - > scale / glm::vec3(2.0, 2.0, 2.0)); // Scale
#else
mvMatrix =
scaleToNDC *
cameraRotate *
cameraTranslate *
glm::translate(glm::mat4(), glm::vec3(obj - > position.x, obj - > position.y, obj - > position.z)) // World translate
*
rotate *
glm::scale(glm::mat4(), obj - > scale / glm::vec3(2.0, 2.0, 2.0)); // Scale
#endif

//   ______
// |\\5   4|
// |0\\    |
// |  \\   |
// |   \\  |
// |    \\3|
// |1__2_\\|

// Triangle 1

// Vertex 0
data[cursor + 0] = -1.0 f;
data[cursor + 1] = 1.0 f;
data[cursor + 2] = 0.0 f;
data[cursor + 3] = 1.0 f;

UV input;
input.u = 0.0 f;
input.v = 1.0 f;
UV output = texAtlas.getUV(obj - > textureName, input);

data[cursor + 4] = output.u;
data[cursor + 5] = output.v;

data[cursor + 6] = mvMatrix[0][0];
data[cursor + 7] = mvMatrix[0][1];
data[cursor + 8] = mvMatrix[0][2];
data[cursor + 9] = mvMatrix[0][3];

data[cursor + 10] = mvMatrix[1][0];
data[cursor + 11] = mvMatrix[1][1];
data[cursor + 12] = mvMatrix[1][2];
data[cursor + 13] = mvMatrix[1][3];

data[cursor + 14] = mvMatrix[2][0];
data[cursor + 15] = mvMatrix[2][1];
data[cursor + 16] = mvMatrix[2][2];
data[cursor + 17] = mvMatrix[2][3];

data[cursor + 18] = mvMatrix[3][0];
data[cursor + 19] = mvMatrix[3][1];
data[cursor + 20] = mvMatrix[3][2];
data[cursor + 21] = mvMatrix[3][3];

data[cursor + 22] = obj - > color.r;
data[cursor + 23] = obj - > color.g;
data[cursor + 24] = obj - > color.b;
data[cursor + 25] = obj - > color.a;

cursor += floatsPerVert;

// Vertex 1
data[cursor + 0] = -1.0 f;
data[cursor + 1] = -1.0 f;
data[cursor + 2] = 0.0 f;
data[cursor + 3] = 1.0 f;

input.u = 0.0 f;
input.v = 0.0 f;
output = texAtlas.getUV(obj - > textureName, input);

data[cursor + 4] = output.u;
data[cursor + 5] = output.v;

data[cursor + 6] = mvMatrix[0][0];
data[cursor + 7] = mvMatrix[0][1];
data[cursor + 8] = mvMatrix[0][2];
data[cursor + 9] = mvMatrix[0][3];

data[cursor + 10] = mvMatrix[1][0];
data[cursor + 11] = mvMatrix[1][1];
data[cursor + 12] = mvMatrix[1][2];
data[cursor + 13] = mvMatrix[1][3];

data[cursor + 14] = mvMatrix[2][0];
data[cursor + 15] = mvMatrix[2][1];
data[cursor + 16] = mvMatrix[2][2];
data[cursor + 17] = mvMatrix[2][3];

data[cursor + 18] = mvMatrix[3][0];
data[cursor + 19] = mvMatrix[3][1];
data[cursor + 20] = mvMatrix[3][2];
data[cursor + 21] = mvMatrix[3][3];

data[cursor + 22] = obj - > color.r;
data[cursor + 23] = obj - > color.g;
data[cursor + 24] = obj - > color.b;
data[cursor + 25] = obj - > color.a;

cursor += floatsPerVert;

// Vertex 2
data[cursor + 0] = 1.0 f;
data[cursor + 1] = -1.0 f;
data[cursor + 2] = 0.0 f;
data[cursor + 3] = 1.0 f;

input.u = 1.0 f;
input.v = 0.0 f;
output = texAtlas.getUV(obj - > textureName, input);

data[cursor + 4] = output.u;
data[cursor + 5] = output.v;

data[cursor + 6] = mvMatrix[0][0];
data[cursor + 7] = mvMatrix[0][1];
data[cursor + 8] = mvMatrix[0][2];
data[cursor + 9] = mvMatrix[0][3];

data[cursor + 10] = mvMatrix[1][0];
data[cursor + 11] = mvMatrix[1][1];
data[cursor + 12] = mvMatrix[1][2];
data[cursor + 13] = mvMatrix[1][3];

data[cursor + 14] = mvMatrix[2][0];
data[cursor + 15] = mvMatrix[2][1];
data[cursor + 16] = mvMatrix[2][2];
data[cursor + 17] = mvMatrix[2][3];

data[cursor + 18] = mvMatrix[3][0];
data[cursor + 19] = mvMatrix[3][1];
data[cursor + 20] = mvMatrix[3][2];
data[cursor + 21] = mvMatrix[3][3];

data[cursor + 22] = obj - > color.r;
data[cursor + 23] = obj - > color.g;
data[cursor + 24] = obj - > color.b;
data[cursor + 25] = obj - > color.a;

cursor += floatsPerVert;

// Triangle 2

// Vertex 3
data[cursor + 0] = 1.0 f;
data[cursor + 1] = -1.0 f;
data[cursor + 2] = 0.0 f;
data[cursor + 3] = 1.0 f;

input.u = 1.0 f;
input.v = 0.0 f;
output = texAtlas.getUV(obj - > textureName, input);

data[cursor + 4] = output.u;
data[cursor + 5] = output.v;

data[cursor + 6] = mvMatrix[0][0];
data[cursor + 7] = mvMatrix[0][1];
data[cursor + 8] = mvMatrix[0][2];
data[cursor + 9] = mvMatrix[0][3];

data[cursor + 10] = mvMatrix[1][0];
data[cursor + 11] = mvMatrix[1][1];
data[cursor + 12] = mvMatrix[1][2];
data[cursor + 13] = mvMatrix[1][3];

data[cursor + 14] = mvMatrix[2][0];
data[cursor + 15] = mvMatrix[2][1];
data[cursor + 16] = mvMatrix[2][2];
data[cursor + 17] = mvMatrix[2][3];

data[cursor + 18] = mvMatrix[3][0];
data[cursor + 19] = mvMatrix[3][1];
data[cursor + 20] = mvMatrix[3][2];
data[cursor + 21] = mvMatrix[3][3];

data[cursor + 22] = obj - > color.r;
data[cursor + 23] = obj - > color.g;
data[cursor + 24] = obj - > color.b;
data[cursor + 25] = obj - > color.a;

cursor += floatsPerVert;

// Vertex 4
data[cursor + 0] = 1.0 f;
data[cursor + 1] = 1.0 f;
data[cursor + 2] = 0.0 f;
data[cursor + 3] = 1.0 f;

input.u = 1.0 f;
input.v = 1.0 f;
output = texAtlas.getUV(obj - > textureName, input);

data[cursor + 4] = output.u;
data[cursor + 5] = output.v;

data[cursor + 6] = mvMatrix[0][0];
data[cursor + 7] = mvMatrix[0][1];
data[cursor + 8] = mvMatrix[0][2];
data[cursor + 9] = mvMatrix[0][3];

data[cursor + 10] = mvMatrix[1][0];
data[cursor + 11] = mvMatrix[1][1];
data[cursor + 12] = mvMatrix[1][2];
data[cursor + 13] = mvMatrix[1][3];

data[cursor + 14] = mvMatrix[2][0];
data[cursor + 15] = mvMatrix[2][1];
data[cursor + 16] = mvMatrix[2][2];
data[cursor + 17] = mvMatrix[2][3];

data[cursor + 18] = mvMatrix[3][0];
data[cursor + 19] = mvMatrix[3][1];
data[cursor + 20] = mvMatrix[3][2];
data[cursor + 21] = mvMatrix[3][3];

data[cursor + 22] = obj - > color.r;
data[cursor + 23] = obj - > color.g;
data[cursor + 24] = obj - > color.b;
data[cursor + 25] = obj - > color.a;

cursor += floatsPerVert;

// Vertex 5
data[cursor + 0] = -1.0 f;
data[cursor + 1] = 1.0 f;
data[cursor + 2] = 0.0 f;
data[cursor + 3] = 1.0 f;

input.u = 0.0 f;
input.v = 1.0 f;
output = texAtlas.getUV(obj - > textureName, input);

data[cursor + 4] = output.u;
data[cursor + 5] = output.v;

data[cursor + 6] = mvMatrix[0][0];
data[cursor + 7] = mvMatrix[0][1];
data[cursor + 8] = mvMatrix[0][2];
data[cursor + 9] = mvMatrix[0][3];

data[cursor + 10] = mvMatrix[1][0];
data[cursor + 11] = mvMatrix[1][1];
data[cursor + 12] = mvMatrix[1][2];
data[cursor + 13] = mvMatrix[1][3];

data[cursor + 14] = mvMatrix[2][0];
data[cursor + 15] = mvMatrix[2][1];
data[cursor + 16] = mvMatrix[2][2];
data[cursor + 17] = mvMatrix[2][3];

data[cursor + 18] = mvMatrix[3][0];
data[cursor + 19] = mvMatrix[3][1];
data[cursor + 20] = mvMatrix[3][2];
data[cursor + 21] = mvMatrix[3][3];

data[cursor + 22] = obj - > color.r;
data[cursor + 23] = obj - > color.g;
data[cursor + 24] = obj - > color.b;
data[cursor + 25] = obj - > color.a;

cursor += floatsPerVert;

q++;
}
}

#if defined PLATFORM_WINDOWS || defined PLATFORM_OSX
// Generate VAO
glGenVertexArrays(1, (GLuint * ) & vao);
checkGLError("glGenVertexArrays");
glBindVertexArray(vao);
checkGLError("glBindVertexArray");#
endif

// Generate VBO
glGenBuffers(1, (GLuint * ) & vbo);
checkGLError("glGenBuffers");
glBindBuffer(GL_ARRAY_BUFFER, vbo);
checkGLError("glBindBuffer");

glBufferData(GL_ARRAY_BUFFER, sizeof(float) * 6 * floatsPerVert * q, data, GL_STATIC_DRAW);
checkGLError("glBufferData");

// Delete data
delete data;

// Get aspect
float width = PLAT_GetWindowWidth();
float height = PLAT_GetWindowHeight();#
ifdef PLATFORM_OPENVR
float aspect = 1.0;#
else
float aspect = width / height;#
endif

// DRAW
glEnable(GL_CULL_FACE);
checkGLError("glEnable");
glFrontFace(GL_CCW);
checkGLError("glFrontFace");

glCullFace(GL_BACK);
checkGLError("glCullFace");

glEnable(GL_BLEND);
checkGLError("ShapeRenderer glEnable");#
ifndef PLATFORM_ANDROID
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
checkGLError("ShapeRenderer glBlendFunc");#
endif

// Add program to OpenGL environment
int curProgram = -1;
curProgram = programMain;

glUseProgram(curProgram);
checkGLError("SpriteRenderer glUseProgram");

#if defined PLATFORM_WINDOWS || defined PLATFORM_OSX
// Bind the VAO
glBindVertexArray(vao);
checkGLError("glBindVertexArray");#
endif

// Bind the VBO
glBindBuffer(GL_ARRAY_BUFFER, vbo);
checkGLError("glBindBuffer");

// Set the projection matrix
glm::mat4 projMatrix;

#if defined PLATFORM_OPENVR
projMatrix = glm::make_mat4((const GLfloat * ) g_projectionMatrix.get());#
else
projMatrix = glm::perspective(VIEW_FOV, aspect, 0.001 f, 1000.0 f);#
endif

setMatrix(curProgram, "projectionMatrix", projMatrix);

setUniform4f(curProgram, "globalColor", globalColor.x, globalColor.y, globalColor.z, globalColor.w);

int t = texAtlas.getGlTexId();

glActiveTexture(GL_TEXTURE0);
checkGLError("glActiveTexture");

glBindTexture(GL_TEXTURE_2D, t);

setUniform2f(curProgram, "vTexSpan", 1.0, 1.0);
setUniform1f(curProgram, "useTexture", 1.0);

// Set attributes
setVertexAttrib(curProgram, "vPosition", 4, GL_FLOAT, false, floatsPerVert * sizeof(float), 0);
setVertexAttrib(curProgram, "vTexCoords", 2, GL_FLOAT, false, floatsPerVert * sizeof(float), 4);

setVertexAttrib(curProgram, "mvMatrixPt1", 4, GL_FLOAT, false, floatsPerVert * sizeof(float), 6);
setVertexAttrib(curProgram, "mvMatrixPt2", 4, GL_FLOAT, false, floatsPerVert * sizeof(float), 10);
setVertexAttrib(curProgram, "mvMatrixPt3", 4, GL_FLOAT, false, floatsPerVert * sizeof(float), 14);
setVertexAttrib(curProgram, "mvMatrixPt4", 4, GL_FLOAT, false, floatsPerVert * sizeof(float), 18);

setVertexAttrib(curProgram, "vColor", 4, GL_FLOAT, false, floatsPerVert * sizeof(float), 22);

// Draw
glDrawArrays(GL_TRIANGLES, 0, q * 6);
checkGLError("glDrawArrays");

#if defined PLATFORM_WINDOWS || defined PLATFORM_OSX
// Reset
glBindVertexArray(0);
glBindTexture(GL_TEXTURE_2D, 0);
glUseProgram(0);#
endif

// Delete VAO and VBO
glDeleteBuffers(1, (GLuint * ) & vbo);#
if defined PLATFORM_WINDOWS || defined PLATFORM_OSX
glDeleteVertexArrays(1, (GLuint * ) & vao);#
endif

	//
//

"attribute vec4 vPosition;"\
"varying lowp vec4 posOut; "\
"attribute vec2 vTexCoords;"\
"varying lowp vec2 vTexCoordsOut; "\
"uniform vec2 vTexSpan;"\
"attribute vec4 vNormal;"\
"varying vec4 vNormalOut;"\
"attribute vec4 vVertexLight; "\
"varying vec4 vVertexLightOut; "\
"uniform mat4 projectionMatrix; "\
"varying lowp float distToCamera; "\

"attribute vec4 mvMatrixPt1; "\
"attribute vec4 mvMatrixPt2; "\
"attribute vec4 mvMatrixPt3; "\
"attribute vec4 mvMatrixPt4; "\

"attribute vec4 vColor; "\
"varying vec4 vColorOut;"\

"attribute mat4 oldmvMatrix; "\

"void main() {"\

"  mat4 mvMatrix; "\

"  mvMatrix[0] = mvMatrixPt1; "\
"  mvMatrix[1] = mvMatrixPt2; "\
"  mvMatrix[2] = mvMatrixPt3; "\
"  mvMatrix[3] = mvMatrixPt4; "\

"  gl_Position = projectionMatrix * mvMatrix * vPosition; "
"  vTexCoordsOut = vTexCoords * vTexSpan; "\
"  posOut = gl_Position; "\

"  vec4 posBeforeProj = mvMatrix * vPosition;"\
"  distToCamera = -posBeforeProj.z; "\

"  vColorOut = vColor; "\
"}\n";

//
//

"uniform sampler2D uTexture; "\
"uniform lowp vec4 vColor; "\
"uniform lowp vec4 globalColor; "\
"varying lowp vec2 vTexCoordsOut; "\
"varying lowp vec4 posOut; "\
"uniform lowp float useTexture; "\

"varying lowp float distToCamera; "\
"varying lowp vec4 vColorOut; "\

"void main() {"\

"   lowp vec4 f = texture2D(uTexture, vTexCoordsOut.st); "\
"   if (f.a == 0.0) "\

"	lowp float visibility = 1.0; "\
"   lowp float alpha = 1.0; "\

"   if (distToCamera >= fadeNear) "\
"		alpha = 1.0 - (distToCamera - fadeNear) * 3.0; "\

"   if (useTexture == 1.0)"\
"   {"\
"      gl_FragColor = texture2D(uTexture, vTexCoordsOut.st) * vColorOut * vec4(visibility, visibility, visibility, alpha) * globalColor; "\
"   }"\
"   else"\
"   {"\
"      gl_FragColor = vColorOut * vec4(visibility, visibility, visibility, alpha) * globalColor; "\
"   }"\
"}\n";

The rest of the new code is here:

Edited by VoxycDev

##### Share on other sites

Did you profile? CPU or GPU bottleneck?

The new code seems to do some extra things as well.

• mvMatrix is being made for each sprite and then stored in every vertex? That is a lot of data. Normally for rendering tiles, I'll just do the X/Y addition in the CPU code (directly, no matrix), then if the game say camera rotation/zooming and I need a matrix, I'll just have a single one for all world sprites in the entire frame (as a uniform).

I also found doing the translation directly helps avoid FP rounding errors that can cause visible seams between adjacent tiles/sprites.

• Not sure what the cost of glGenBuffers, glGenVertexArrays, etc. is. The code I have here appears to re-use the same one, replacing the contents with glBufferData. I also believe STATIC is slower to "upload" than DYNAMIC or STREAM.

• What is texAtlas.add(obj- >textureName);. Your not rebuilding a texture dynamically are you? Even if not every frame, need to be careful not to cause slow frames / stutter. Also looks like a string, if its doing string map lookups for every sprite that is not ideal.

• Also not sure on the cost of things like setVertexAttrib. You should be able to do this once, and it is saved with the GL_ARRAY_BUFFER (possibly all in one go, e.g. glVertexAttribPointer)

• Any sort of dynamic branch in a shader is usually bad if adjacent/nearby data will branch differently. GPU cores are not like CPU ones and can't all independently do their own thing. I didn't look closely at your data, but something to be aware of.

• The "useTexture" path calls texture2D twice, I am not sure this will be optimised out.

• The mvMatrix * vPosition multiplication is done twice, again not sure it will optimise that.

• What is vTexSpan for? Seems like extra work. Likewise for the unused normals.

##### Share on other sites

That code you posted for OpenGL ES2, is it meant to be pseudocode? There are no functions. It is not clear what you are doing as a once off process and what you are doing per frame. The general idea in graphics programming (and game programming in general) is usually to move as much code into a once off process (on starting game or level etc) and do as little as possible per frame.

As such, to move a viewpoint, you typically don't change the vertex data, you might change, e.g. a matrix representing the view / camera transform and pass it as a uniform. This is very cheap to do for the GPU.

If you do need to change dynamically vertex data each frame, you should explicitly tell the API that it is dynamic (rather than static unchanging) on creation. You have to be very careful using dynamic vertex buffers so as not to drastically affect performance by stalling the pipeline. In some cases this means creating e.g. 3 copies of a dynamic VB, and using them in turn on each frame. In some cases the API help do this for you, it is a good idea to try both and compare if you are not sure.

You also absolutely do not want to be making any dynamic allocations / deallocations either on the GPU or CPU each frame.

##### Share on other sites

You recreate the array each frame, consider making a big array of sprites and not to use all of them, or if you have a constant num of sprites then you use glBufferSubData and you dont do glGenBuffers per frame too its only needed when you change the size of vertex buffer, anyway you gave no idea what ur doing

##### Share on other sites
13 hours ago, SyncViews said:

Did you profile? CPU or GPU bottleneck?

No. I will. Thank you.

13 hours ago, SyncViews said:

The new code seems to do some extra things as well.

• mvMatrix is being made for each sprite and then stored in every vertex? That is a lot of data. Normally for rendering tiles, I'll just do the X/Y addition in the CPU code (directly, no matrix), then if the game say camera rotation/zooming and I need a matrix, I'll just have a single one for all world sprites in the entire frame (as a uniform).

Yes, since I was looking for a way to draw all sprites with one call, I decided to make mvMatrix an attribute. Should I try sending it as an array of uniforms? Maybe I can send an index of the sprite as an attribute and then get mvMatrix out of a uniform array based on that? Then, I think I will be constrained by maximum size of a uniform array. These guys here talk about values around 512 maximum floats (32 matrices max). Granted, this is a conversation from 2008 so the limits must have risen since then. I'm looking to draw around 256-1024 sprites (32x32 grid would be nice), and it should be as butter-smooth as OpenGL 1.1 was. It would suffice if I could have 256 matrices in a uniform array if that speeds things up. Can I?

Thing is, even though right now it's a just a grid, the sprites are supposed to be stretchable/bendable, like trey were in my old fixed pipeline code, so yes, each corner of each poly does have to have a completely unique position on every frame. What I'm building is an editor for a flexible mesh of voxels, where you can stretch each corner and morph it into interesting architecture or landscapes. This worked perfectly in my old engine but it was Java and fixed-function.

13 hours ago, SyncViews said:
• Not sure what the cost of glGenBuffers, glGenVertexArrays, etc. is. The code I have here appears to re-use the same one, replacing the contents with glBufferData. I also believe STATIC is slower to "upload" than DYNAMIC or STREAM.

Got it. Will try re-use the same VBO and will try DYNAMIC and STREAM. Other people have mentioned this as well below. Thank you.

13 hours ago, SyncViews said:
• What is texAtlas.add(obj- >textureName);. Your not rebuilding a texture dynamically are you? Even if not every frame, need to be careful not to cause slow frames / stutter. Also looks like a string, if its doing string map lookups for every sprite that is not ideal.

It makes sure the texture is in the texture atlas. It's rebuilt as-needed (only when a brand new texture is added). You're right, I probably should get rid of string map lookup here. But in this particular case there is only one texture so array size is 1, so it's not the bottleneck.

13 hours ago, SyncViews said:
• Also not sure on the cost of things like setVertexAttrib. You should be able to do this once, and it is saved with the GL_ARRAY_BUFFER (possibly all in one go, e.g. glVertexAttribPointer)

setVertexAttrib just calls all the gl functions needed to set up an attribute. Good point, though. I should try to do this once if I can. This is not the only program/renderer that runs in the engine though, so I assumed I have to re-set-up all the attributes on every frame for every program. Is that not the case?

13 hours ago, SyncViews said:
• Any sort of dynamic branch in a shader is usually bad if adjacent/nearby data will branch differently. GPU cores are not like CPU ones and can't all independently do their own thing. I didn't look closely at your data, but something to be aware of.

I'm not super worried about the gaps between the sprites. This is only for an editor, not for rendering in the game. As long as it's smooth and I can quickly build vast landscapes and cities out of voxels, that's all I care about.

13 hours ago, SyncViews said:

• The "useTexture" path calls texture2D twice, I am not sure this will be optimised out.

• The mvMatrix * vPosition multiplication is done twice, again not sure it will optimise that.

Thank you, will try to see if I can only calculate these once.

13 hours ago, SyncViews said:
• What is vTexSpan for? Seems like extra work. Likewise for the unused normals.

It's texture span. It's basically how many voxels a texture spans before it repeats. I still have it in the shader for some legacy reason I think. I can't remember what I was going to do with this value, but I think it was important for something once before.

##### Share on other sites
7 hours ago, lawnjelly said:

That code you posted for OpenGL ES2, is it meant to be pseudocode? There are no functions. It is not clear what you are doing as a once off process and what you are doing per frame. The general idea in graphics programming (and game programming in general) is usually to move as much code into a once off process (on starting game or level etc) and do as little as possible per frame.

It's C++ that runs on every frame. Will clarify these things in the future.

7 hours ago, lawnjelly said:

As such, to move a viewpoint, you typically don't change the vertex data, you might change, e.g. a matrix representing the view / camera transform and pass it as a uniform. This is very cheap to do for the GPU.

If you do need to change dynamically vertex data each frame, you should explicitly tell the API that it is dynamic (rather than static unchanging) on creation. You have to be very careful using dynamic vertex buffers so as not to drastically affect performance by stalling the pipeline. In some cases this means creating e.g. 3 copies of a dynamic VB, and using them in turn on each frame. In some cases the API help do this for you, it is a good idea to try both and compare if you are not sure.

You also absolutely do not want to be making any dynamic allocations / deallocations either on the GPU or CPU each frame.

Yes, in this case the goal is to change vertex data on every frame. Will definitely try using one VBO without recreating it, set it as dynamic, then do updates to it. Thank you.

7 hours ago, _WeirdCat_ said:

You recreate the array each frame, consider making a big array of sprites and not to use all of them, or if you have a constant num of sprites then you use glBufferSubData and you dont do glGenBuffers per frame too its only needed when you change the size of vertex buffer, anyway you gave no idea what ur doing

Thank you for the brilliant suggestion. Will do exactly that. Thanks for pointing out that I have no idea what I'm doing. I guess this is why I'm here, so I can learn from you and one day, maybe, have an idea of what I'm doing.

##### Share on other sites

@SyncViews, just an idea. What if I send mvMatrix as a uniform array, and even though I can only send 32 or 64 matrices at once, I can then break it up into, let's say, 4 draw calls, to do 128 or 256 sprites? Maybe worth a try.

##### Share on other sites
12 minutes ago, VoxycDev said:

What if I send mvMatrix as a uniform array, and even though I can only send 32 or 64 matrices at once, I can then break it up into, let's say, 4 draw calls, to do 128 or 256 sprites? Maybe worth a try.

What are those sprites actualy defined by? Do they rotate in world space, scale, and translate? In that case you still need only 3x4 matrix not a 4x4 matrix, which saves you entire 4f vector in uniform, general most lowend limit for uniform array is 256 times 4f vectors. In case your sprites do not rotate (what they should not, so not call it sprites if they do but general quads) you can use a single 4f vector for position and fourth number as the scale factor around all 3 axises.

##### Share on other sites
1 minute ago, JohnnyCode said:

What are those sprites actualy defined by? Do they rotate in world space, scale, and translate? In that case you still need only 3x4 matrix not a 4x4 matrix, which saves you entire 4f vector in uniform, general most lowend limit for uniform array is 256 times 4f vectors. In case your sprites do not rotate (what they should not, so not call it sprites if they do but general quads) you can use a single 4f vector for position and fourth number as the scale factor around all 3 axises.

Well, ideally I want a multi-purpose blazing-fast particle system. But you're absolutely right! I should try to pass only as much data as is absolutely required for the task.

##### Share on other sites
48 minutes ago, VoxycDev said:
8 hours ago, _WeirdCat_ said:

You recreate the array each frame, consider making a big array of sprites and not to use all of them, or if you have a constant num of sprites then you use glBufferSubData and you dont do glGenBuffers per frame too its only needed when you change the size of vertex buffer, anyway you gave no idea what ur doing

Thank you for the brilliant suggestion. Will do exactly that. Thanks for pointing out that I have no idea what I'm doing. I guess this is why I'm here, so I can learn from you and one day, maybe, have an idea of what I'm doing.

I think you misread weirdcat, I think he was implying that how you do it will depend on what exactly you want to achieve, 'gave no idea' rather than 'have no idea'!

I don't think you are that far from something decent.. it is more a case of jigging things around to make it more efficient for the hardware, reducing the amount of unnecessary expensive calls and repeat work. To go further on my suggestion about separating your once off work from your per frame work, you could have something like this:

void Game_Start() // one off stuff on game creation, create shaders, textures maybe, vertex buffers etc?
void Game_End() // free resources etc used by the whole game

void Level_Start() // one off stuff dependent on a game level .. might have some resources
void Level_End() // free level resources

void Frame_Update() // stuff you want to do on your frame, updating if necessary and drawing using the resources you have already created

That kind of scheme is fine to get started, be aware thought that on some platforms you can 'lose the device' for the 3D and sometimes more (say if the user starts playing another game in between, or alt tab etc, this may be the case on android from memory), in which case you need to recreate your GPU resources, in which case it makes sense to reuse the same bit of code you would use on game / level start for resource creation. It can often be a good idea to use pools as wierdcat suggested, allocate more than you need at the start, then use whatever you need on each frame (this is true for main memory as well as GPU resources).

The other things is that you appear to be recreating and compiling the shader on every frame, which will probably kill performance. Again move this to one off code and reuse the shader. After all this is done you can reassess whether there are any bottlenecks.

## Create an account

Register a new account

• 10
• 10
• 12
• 10
• 33