Jump to content

  • Log In with Google      Sign In   
  • Create Account


We have 4 x Pro Licences (valued at $59 each) for 2d modular animation software Spriter to give away in this Thursday's GDNet Direct email newsletter.

Read more in this forum topic or make sure you're signed up (from the right-hand sidebar on the homepage) and read Thursday's newsletter to get in the running!


Member Since 19 May 2002
Offline Last Active Mar 24 2013 04:51 AM

Posts I've Made

In Topic: OpenGL ES 2.0 on Android: how to render 500 cubes effectively

22 December 2012 - 07:52 AM

I know this thread is a month old, but I have invested serious thought and work into the project, and I'd like to post an update to show that I aprecciate your answers, and to help other people with similar issues.






Do you have any profiling tools that allow you to measure CPU and GPU timings independently? The first step will be determining which processor is the bottleneck, so you can focus your optimisations usefully.




I believe that you have a very good point here; I have mostly been "optimizing blindly", which is considered bad practice. I have tried / considered the following options:



1. Android SDK Tools: the profiler shipped with the SDK shows CPU time, but not GPU time. Also, it profiles my own application, but I would like to see what's going on in the whole system. Google has recognized this and published a tool for system-wide profiling called systrace, but it's only available for Jelly Bean. The same goes for dumpsys gfxinfo which in combination with the Jelly Bean developer option "Profile GPU Rendering" outputs a stat about how much time is spent processing, drawing and swapping UI. See Chet Haase and Guy Romain's excellent presentation about Android UI performance for more information about those tools. For me, they are not an option, I am stuck with Honeycomb for various reasons.




2. Log output: yes I know it is stone age, but I thought it would be interesting to see how much time my application spends in my move() (CPU) and draw() (GPU) methods. The results are not very conclusive; I guess this has to to with multithreading and the way Android handles vsync.



3. NVIDIA Tools: there is a tool for Tegra 2 platforms called PerfHUD ES that looks very promising: detailed information about draw calls and lots of other GPU-related information. I am currently trying to get this running on honeycomb. Any help aprecciated.







Option 1: Software transform the vertices. That is, do the matrix multiply on the CPU, then you can send all the cubes in one go.
Option 2: 'Hardware skinning' style solution. Instead of doing one cube at a time, put 16* cubes into your VBO. The vertices for each cube have indices, which you use in your vertex shader to look up into an array of matrix uniforms.




Both of your options seem very reasonable approaches to take. I decided to implement option 2 first. The OpenGL ES specification calls this method "Dynamic Indexing".  It took me an hour to rearrange my application accordingly, and a whole day to find out how to get hold of the vertex index inside the shader and use it to address my transformation matrices. It's not straightforward on OpenGL ES 2.0 because for some fucking reason they decided to leave out the crucial gl_VertexID variable there. Sorry, but this tiny little detail really drove me mad. Anyways, the solution is quite simple, once you know how to do it. Anton Holmquist has a short, to-the-point blog post about it, which I wish I'd found earlier.


The one big drawback about this method is, like you pointed out, the limited uniform space. For those who don't have a clue what that is (like I did): it is the space available for declaring uniform variables in the shader. I read somewhere that this space relates to the amount / size of registers the GPU has - correct me if I'm wrong here. For anyone interested, calling glGetIntegerv(GL_MAX_VERTEX_UNIFORM_VECTORS) will return a number that says how much uniform space you have on your system. The specification for OpenGL ES 2.0 says it has to be at least 128. The number is expressed in vectors, and since each matrix has 4 vectors, that means you could declare a maximum of 32 matrix variables or, in my case, two arrays of 16 matrices. In other words, I can batch a maximum of 16 cubes now. 





Dynamic indexing has certainly improved performance, but I am not entirely happy yet. I will implement the abovementioned option 1, hoping to improve performance by shifting work from the GPU to the CPU and, as a next step, parallize the move() and draw() operations.

In Topic: OpenGL ES 2.0 on Android: how to render 500 cubes effectively

24 November 2012 - 06:24 AM

That said, a solution will probably involve call glDraw less often than once-per-cube.

True, but how ? Since every cube needs its own scaling, rotation and translation, how can I combine glDrawArrays() calls?

In Topic: Effective way to detect whether my 3D cubes are outside the 2D screen boundar...

09 September 2012 - 05:28 AM

just keep your point method, and multiply the result by 0.9 or something so that you effectiveley increase the virtual size of your screen and the cubes will pop away after they pass the border region.

No, I believe that would not work properly for all cases.

Well if you're using only cubes, you can just check based on the center plus half the width of a face, making it a pseudo bounding sphere check.

I think it would have to be diagonal / 2, but in principle, that is what I had in mind. But how much is diagonal / 2 in normalized device coordinates?

What's wrong with the following method?

Actually, I think you may be right. I thought the distance calculation was a bit expensive, but I had a closer look and it's just a few multiplications and additions.

In Topic: Fade to Black in D3D without enabling lighting?

24 May 2005 - 10:33 PM

Thanks. I just found a really simple solution though:

m_pdDevice->SetTextureStageState(0, D3DTSS_COLORARG2, D3DTA_TFACTOR);

// set "light" color
m_pdDevice->SetRenderState(D3DRS_TEXTUREFACTOR, nColor);

In Topic: Frame-based jump'n'run - not smooth, ugly tearing

14 May 2005 - 11:44 AM

First of all, thank you for your interest. To show the problem more clearly, I've compiled a demo that sets D3DPRESENT_INTERVAL_ONE, and thus limits the framerate the to system's VSYNC rate (e.g. 60Hz on my system). At least on my laptop, the sprites even seem to jump back and forth. I'd be interested to hear how it runs on other systems. You can download it here... Try moving around the character with the arrow keys, you'll see what I mean.

I'm using DirectX 9.0c SDK, C++, and I'm using textured quads to render the 2D content. I am using one dynamic vertex buffer, and I lock the buffer each frame. The code snippet below shows the relevant bits.

// init phase:

// setup fullscreen parameters
m_pd3dppf = new D3DPRESENT_PARAMETERS;
ZeroMemory(m_pd3dppf, sizeof(D3DPRESENT_PARAMETERS));
m_pd3dppf->Windowed = false;
m_pd3dppf->SwapEffect = D3DSWAPEFFECT_DISCARD;
m_pd3dppf->PresentationInterval = D3DPRESENT_INTERVAL_ONE;
m_pd3dppf->BackBufferWidth = nWidth;
m_pd3dppf->BackBufferHeight = nHeight;
m_pd3dppf->BackBufferCount = 1;
m_pd3dppf->BackBufferFormat = d3ddm.Format;

// create Direct3D device
hr = m_pd3d->CreateDevice(


// create the vertex buffer
hr = m_pd3dd->CreateVertexBuffer(

nPanelCnt * CPanel::VERTEX_COUNT * CPanel::VERTEX_SIZE,

// each frame:

// check if dynamic VB is full
int nFlags;

if (m_nVBOffset + nPanelCnt > m_nVBSize)
// lock new VB, discard old
m_nVBOffset = 0;
// lock unused part of VB

// lock VB
hr = m_pVB->Lock(

m_nVBOffset * CPanel::VERTEX_COUNT * CPanel::VERTEX_SIZE,
nPanelCnt * CPanel::VERTEX_COUNT * CPanel::VERTEX_SIZE,
(VOID **)ppVer,

// draw VB
hr = m_pd3dd->DrawPrimitive(

m_nVBOffset * CPanel::VERTEX_COUNT,

// unlock VB
hr = m_pVB->Unlock();

// advance VB offset
m_nVBOffset += nPanelCnt;

[Edited by - space_cadet on May 15, 2005 6:44:52 AM]