Odd rendering issue; need help

Started by
16 comments, last by noodleBowl 10 years, 6 months ago

I am having a really strange issue with my render function. I currently created a spritebatcher that renders great until you go over the array size for the vertex buffer.

I currently have it set up where if you draw more primitives than the array can handle it will call the endBatch function. Which sets up everything for rendering, renders everything in the vertex buffer, and then resets the counts on the rendering items (number of shapes to draw, number of vertices, etc).

My issue is that if the max vertex count is met perfectly everything renders just fine. But if I go over the max vertex count, all of my drawn objects flicker or flash extremely fast. Like one only sprite is being drawn and it is warping to all of the locations.

Can someone please help me to figure out why this is happening. Currently my max on my vertex array is 4 (debug purposes) and I am attempting to draw 3 sprites. If you comment out 2 of the draw calls in the render function in the main.cpp file the sprite renders fine (no flickering). Otherwise my issue above happens

[+]----------------------------------[ Original Problem Solved! ]

Current Code -

main.cpp : http://pastebin.com/m7N7Y2e0

SpriteBatcher.h : http://pastebin.com/wGMJ8Wv0

SpriteBatcher.cpp : http://pastebin.com/SPdm1yfX

[+]----------------------------------[ Current topic making a better batcher! ]

You are kinda on the right track with the sprite batch, but there are some concepts that have been missed. Although I haven't done this in DX9, I have in DX10 and DX11 and conceptually I would imagine it is the same in DX9.

You don't want your sprite batch calling Present. It currently is doing this. This definitely will cause some flickering since you will only partially draw your sprites in each frame.

Should be more like:

  • Clear back buffer.
  • Draw sprites, as many as you want.
  • Present (from somewhere outside the spritebatch when you know nothing else will be rendered).

In your SpriteBatch you should create dynamic vertex and index buffers. The key here is to make them dynamic. You are currently creating new buffers each time endBatch is called which I would imagine is slow if you do it a lot. I didn't see them cleaned up either, but I didn't look through all of the code.

Your sprite batch needs a buffer to track your queued up quads and also needs to keep track of the last position your wrote to in the dynamic buffers. When your local buffer is full, you need to get a lock on the dynamic buffers, write your new vertex and index data to wherever you left off, release the lock, then DrawIndex (whatever the DX9 equivalent is) giving the offset into your dynamic buffer that you just wrote new data to.

When you lock the dynamic buffers you want to use lock flags to let the api know what your intentions are. If you are still filling up the dynamic buffer use D3DLOCK_NOOVERWRITE and you are promising you are not going to overwrite any data that has already been submitted and might currently be drawing. Technically, I think you CAN overwrite if you want, but you'll mess things up and probably see it. Pretty sure if you don't lock it with this flag you wait until the gpu is done drawing the contents before you get the lock (bad/slow).

If you are towards the end of your dynamic buffer, you want to specify the DISCARD flag. This returns you a pointer that you can start filling from the top again. If the gpu is still processing data you already submitted you get a new pointer here. Point is, you don't have to care whether the gpu is done with it or not, if you say DISCARD you get a pointer that is valid to write to from the top.

Process is something like this:

  • Take lock on buffers:
    • Full or Almost full, use DISCARD
    • Plenty of room left, use NOOVERWRITE
  • Add vertex/index data to buffers.
  • Release lock.
  • DrawIndexed

You probably want your own list of quads in your sprite batch rather than directly writing them to the dynamic buffers. This lets you sort later on down the road if you need to. Fill up internal list of quads via your draw calls, call EndBatch, sort your list by image (or whatever causes a state change), fill dynamic buffers (possibly multiple lock/fill/unlock/DrawIndex).

You will need to consider how you sort, or if you sort, quads if you batch them up like I mentioned. If your quads are drawn in an order-dependent manner (some have to be on top of others), then sorting by image alone isn't enough. You'll have to decide what forces your sprite batch to submit new quads.

Advertisement
I am not sure what the problem you are trying to solve really is. But a quick look through your code and I can see what causes the odd behaviour you are seeing. When you go over the buffer limit you stop and clear your screen then render your sprites to the screen. This will cause flickering because you cleared the screen and then render a few sprites and then clear the screen again and then render a few sprites more.


What I don't get is why you need to render only a few sprites at a time in the first place. Modern machines(even mobile ones) have more than enough memory and power to render all the sprites at once from a single vertexbuffer as long as you are not drawing many thousands of them. If you have some other reason to batch your rendering like this then the only solution is to not clear the screen every time you render a few sprites.


Clear only when you need to. Wich is never if you have somekind of background that fills the entire screen that you render at the beginning of each update. And by update I don't mean when you render a few sprites but when you render all of them.

The idea is my batcher is that the max number of sprites I would render at one time would be very high eg 10000+ . It is set to cap out at 4 right now to find any issues like the odd flickering one I'm having.

I definitely agree with you that the clear screen function plays apart in the issue, but it is not the sole cause of the problem. If I move the clear screen function outside and place it into the render function in the main.cpp it only "fixes" one sprite.

Now, I think it has something to do with the present function being placed into the endBatch function of the spritebatcher. If I move the present function out into render function of the main.cpp the flicker is gone.

But is this the correct thing to do? Also does this mean I should move the beginScene and endScene out of the sprite batcher's render call? I have heard that having multiple beginScene / endScene can have performance impacts

You are right that the present function call should also be outside the endBatch. When rendering stuff you first clear screen. Then give the gpu everything it needs to render the whole frame. Then tell it to present the frame. Repeat.


But the thing is you are not going to get any performance boost by batching the rendering of the sprites. Batching rendering will infact hurt your performance. The gpu is designed to render large amounts of data in one go. The only reasons anyone ever would want to batch rendering is if the application is not meant to run realtime and they wish to do other stuff in the same thread between the batches or if you are running out of video memory. Wich is unlikely to happen in most cases and when you are running out of video memory you propably wont have enough processingpower to render in realtime anyway.


For example you have 7 floats for each vertex and 4 vertexes and 6 indexes per sprite. That totals at 136 bytes per sprite. Now if your gpu has 512 megabytes of memory it would take near 4 million sprites to fill that memory. And most new graphics cards nowdays have four times or more memory than that.


If you have memory problems like trying to allocate too much static memory in the header file. Then you should look into how to dynamically allocate memory as the limit for dynamically allocated memory is several magnitudes higher. Especially if you are building a 64bit application then dynamic memory is only limited by the amount of ram you have.


Now batching is a really usefull trick for programmers but when you are rendering stuff to the screen you should try to minimize the amount of times you send data or instructions for the gpu. This means that for maximum performance you should draw all your sprites in a single batch if at all possible.

Currently my main.cpp render method looks like


//Draw the things that need to be drawn
void render()
{

	//Clear the screen
	device->Clear(0, NULL, D3DCLEAR_TARGET, D3DCOLOR_XRGB(0, 40, 100), 1.0f, 0);
	device->BeginScene();

	//Draw
	batcher.beginBatch();
	batcher.draw(50.0f, 50.0f, 64.0f, 64.0f, D3DCOLOR_XRGB(0,255,255), tex);
	batcher.draw(250.0f, 50.0f, 64.0f, 64.0f, D3DCOLOR_XRGB(0,0,255), tex);
	batcher.draw(200.0f, 200.0f, 64.0f, 64.0f, D3DCOLOR_XRGB(0,0,255), tex);
	batcher.endBatch();

	device->EndScene();
	device->Present(NULL, NULL, NULL, NULL);
}

Where my render method in my batcher looks like


void SpriteBatcher::render()
{
	//Render everything that needs to be drawn
	
	//Fill / prepare the vertex buffer
	vBuffer->Lock(0, 0, (void**) &pVoid, NULL);
	memcpy(pVoid, vertices, vertCount * sizeof(vertex));
	vBuffer->Unlock();

	//Fill / prepare the index buffer
	iBuffer->Lock(0, 0, (void**) &pVoid, NULL);
	memcpy(pVoid, indices, idxBuffCount * sizeof(short));
	iBuffer->Unlock();


	//Draw code
	batDevice->SetStreamSource(0, vBuffer, 0, sizeof(vertex));
	batDevice->SetIndices(iBuffer);
	
        //Change to only call when we need to set a new texture
	batDevice->SetTexture(0, currentTexture);

	batDevice->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, vertCount, 0, numShapes);
	
}

You're exactly right, I want the min number of render calls to keep top performance. I understand I want to send as much data to the GPU as I can at one time, but I am still unsure how it hurts my performance if the batch number is ridiculously high. Ideally we would want the amount of render calls to always be at 1 and I assume I would only want to render when

1. We need to set a new texture to use, because texture swapping is expensive

2. We are at the max amount of vertexes the batcher "can handle" so we do not crash the program due to an array out of bounds issue

Now, if the performance issue comes from the max size of the array, then I am not sure how to fix this. My first choice would be to use a vector of Vertex structs. But I am uncertain how to fill a vector of this type and load its information into the vertex buffer. Even still I would assume we would want a limit or max out to make sure we do not overload the GPU's memory

You are kinda on the right track with the sprite batch, but there are some concepts that have been missed. Although I haven't done this in DX9, I have in DX10 and DX11 and conceptually I would imagine it is the same in DX9.

You don't want your sprite batch calling Present. It currently is doing this. This definitely will cause some flickering since you will only partially draw your sprites in each frame.

Should be more like:

  • Clear back buffer.
  • Draw sprites, as many as you want.
  • Present (from somewhere outside the spritebatch when you know nothing else will be rendered).

In your SpriteBatch you should create dynamic vertex and index buffers. The key here is to make them dynamic. You are currently creating new buffers each time endBatch is called which I would imagine is slow if you do it a lot. I didn't see them cleaned up either, but I didn't look through all of the code.

Your sprite batch needs a buffer to track your queued up quads and also needs to keep track of the last position your wrote to in the dynamic buffers. When your local buffer is full, you need to get a lock on the dynamic buffers, write your new vertex and index data to wherever you left off, release the lock, then DrawIndex (whatever the DX9 equivalent is) giving the offset into your dynamic buffer that you just wrote new data to.

When you lock the dynamic buffers you want to use lock flags to let the api know what your intentions are. If you are still filling up the dynamic buffer use D3DLOCK_NOOVERWRITE and you are promising you are not going to overwrite any data that has already been submitted and might currently be drawing. Technically, I think you CAN overwrite if you want, but you'll mess things up and probably see it. Pretty sure if you don't lock it with this flag you wait until the gpu is done drawing the contents before you get the lock (bad/slow).

If you are towards the end of your dynamic buffer, you want to specify the DISCARD flag. This returns you a pointer that you can start filling from the top again. If the gpu is still processing data you already submitted you get a new pointer here. Point is, you don't have to care whether the gpu is done with it or not, if you say DISCARD you get a pointer that is valid to write to from the top.

Process is something like this:

  • Take lock on buffers:
    • Full or Almost full, use DISCARD
    • Plenty of room left, use NOOVERWRITE
  • Add vertex/index data to buffers.
  • Release lock.
  • DrawIndexed

You probably want your own list of quads in your sprite batch rather than directly writing them to the dynamic buffers. This lets you sort later on down the road if you need to. Fill up internal list of quads via your draw calls, call EndBatch, sort your list by image (or whatever causes a state change), fill dynamic buffers (possibly multiple lock/fill/unlock/DrawIndex).

You will need to consider how you sort, or if you sort, quads if you batch them up like I mentioned. If your quads are drawn in an order-dependent manner (some have to be on top of others), then sorting by image alone isn't enough. You'll have to decide what forces your sprite batch to submit new quads.

You don't want your sprite batch calling Present. It currently is doing this. This definitely will cause some flickering since you will only partially draw your sprites in each frame.

Should be more like:

  • Clear back buffer.
  • Draw sprites, as many as you want.
  • Present (from somewhere outside the spritebatch when you know nothing else will be rendered).

This is fixed

In your SpriteBatch you should create dynamic vertex and index buffers. The key here is to make them dynamic. You are currently creating new buffers each time endBatch is called which I would imagine is slow if you do it a lot. I didn't see them cleaned up either, but I didn't look through all of the code.

Your sprite batch needs a buffer to track your queued up quads and also needs to keep track of the last position your wrote to in the dynamic buffers. When your local buffer is full, you need to get a lock on the dynamic buffers, write your new vertex and index data to wherever you left off, release the lock, then DrawIndex (whatever the DX9 equivalent is) giving the offset into your dynamic buffer that you just wrote new data to.

When you lock the dynamic buffers you want to use lock flags to let the api know what your intentions are. If you are still filling up the dynamic buffer use D3DLOCK_NOOVERWRITE and you are promising you are not going to overwrite any data that has already been submitted and might currently be drawing. Technically, I think you CAN overwrite if you want, but you'll mess things up and probably see it. Pretty sure if you don't lock it with this flag you wait until the gpu is done drawing the contents before you get the lock (bad/slow).

If you are towards the end of your dynamic buffer, you want to specify the DISCARD flag. This returns you a pointer that you can start filling from the top again. If the gpu is still processing data you already submitted you get a new pointer here. Point is, you don't have to care whether the gpu is done with it or not, if you say DISCARD you get a pointer that is valid to write to from the top.

Process is something like this:

  • Take lock on buffers:
    • Full or Almost full, use DISCARD
    • Plenty of room left, use NOOVERWRITE
  • Add vertex/index data to buffers.
  • Release lock.
  • DrawIndexed

You probably want your own list of quads in your sprite batch rather than directly writing them to the dynamic buffers. This lets you sort later on down the road if you need to. Fill up internal list of quads via your draw calls, call EndBatch, sort your list by image (or whatever causes a state change), fill dynamic buffers (possibly multiple lock/fill/unlock/DrawIndex).

You will need to consider how you sort, or if you sort, quads if you batch them up like I mentioned. If your quads are drawn in an order-dependent manner (some have to be on top of others), then sorting by image alone isn't enough. You'll have to decide what forces your sprite batch to submit new quads.

This is where you kinda start going over my head. I have never done direct x work, so I'm not sure how to do this.

You're right about the new buffers each frame, my endBatch call is where I make a new vertex / index buffer based on the amount of quads I need to draw. Right now, I'm using vertCount and idxBuffCount to keep track of how many vertexs / indices I need to render, which can also tell me where I left off in each array. These values are only reset when I call endBatch. Also the buffers created are the exact size I need since I base them off of the vertCount and idxBuffCount. As for clean up, the buffers are only cleaned up / released in deconstructor of the SpriteBatcher.

Currently when my vertex array max is met, I swap textures, or call endBatch everything is sent to the GPU. My render call is the only time I lock the vertex and index buffers. Where, I then use memcopy to pump the buffers full of my vertex / index data stored in my arrays. As for locking the buffers any other way and filling them without memcopy's use (eg placing the locks in the draw call of the sprite batcher and then filling them) I'm not sure how to do that.

My current endBatch call


void SpriteBatcher::endBatch()
{
	//Get everything ready for the render
	if(vertCount > 0)
	{
		batDevice->CreateVertexBuffer(vertCount * sizeof(vertex), D3DUSAGE_WRITEONLY, CUSTOMFVF, D3DPOOL_MANAGED, &vBuffer, NULL);
		batDevice->CreateIndexBuffer(idxBuffCount * sizeof(short), D3DUSAGE_WRITEONLY, D3DFMT_INDEX16, D3DPOOL_MANAGED, &iBuffer, NULL);
		render();
		resetCounts();
		renderCount++;
	}
	
	std::cout<<renderCount<<std::endl;
}

My current render call inside of SpriteBatcher


void SpriteBatcher::render()
{
	//Render everything that needs to be drawn

	#pragma region Vertex and Index buffers
	
	//Fill / prepare the vertex buffer
	vBuffer->Lock(0, 0, (void**) &pVoid, NULL);
	memcpy(pVoid, vertices, vertCount * sizeof(vertex));
	vBuffer->Unlock();

	//Fill / prepare the index buffer
	iBuffer->Lock(0, 0, (void**) &pVoid, NULL);
	memcpy(pVoid, indices, idxBuffCount * sizeof(short));
	iBuffer->Unlock();

	#pragma endregion

	//Prepare to draw the scene

	//Draw code
	batDevice->SetStreamSource(0, vBuffer, 0, sizeof(vertex));
	batDevice->SetIndices(iBuffer);
	
	//std::cout<<"Texture set: "<<currentTexture<<std::endl;
	batDevice->SetTexture(0, currentTexture);

	batDevice->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, vertCount, 0, numShapes);
	
}

Full current Code -

main.cpp : http://pastebin.com/m7N7Y2e0

SpriteBatcher.h : http://pastebin.com/wGMJ8Wv0

SpriteBatcher.cpp : http://pastebin.com/SPdm1yfX

Instead of creating your vertex and index buffers in endBatch, you would create them once: Create them in the constructor (as dynamic buffers), free them in the destructor. I am not really sure if there is a "golden" size to pick, but the goal would be that you lock the buffers more often with NOOVERWRITE than DISCARD (discard being more expensive if the drivers need to allocate a new buffer for you).

You need another buffer of some sort to keep track of your Draw calls. You don't want to immediately send all your Draw calls to the GPU; you want to batch them up instead (SpriteBATCH). This buffer has nothing to do with the GPU, it is just a list you maintain in some way that is convenient for you. If you call Draw 200 times, this internal list would keep track of the data from those 200 calls. When you call your endBatch, that is when you want to worry about getting the data to the GPU.

When you call endBatch, if those 200 Draw calls didn't need to change any state, you might be able to get away with sending them to the GPU with a single draw call -- they would all have to use the same state and same image in that case (like a sprite font).

There are a couple of ways you could look at making the SpriteBatch. You could set up all the state external to the SpriteBatch, call BeginBatch, do all your draw calls, then call endBatch for each state change (you would be externally driving it this way), or you could make your SpriteBatch more complicated, such as giving your draw call different textures, and let your SpriteBatch worry about sorting out the details -- 5 different images to render, that's 5 state changes. Then when you EndBatch you can sort your list by texture, set up the gpu states, put all the vertices/indices in the dynamic buffer, call draw, then start processing the next image's quads, until you have sent all your Draw calls to the GPU.

For the dynamic buffers, you copy the data in like you are currently doing, but the buffer is always there and you need to make sure you insert new data for drawing after the last data you inserted (NOOVERWRITE). In your lock calls, you didn't use any flags. For a dynamic buffer the flag needs to be the NOOVERWRITE or DISCARD flag.

Scenario:

  • You queue up 150 quads to draw via your SpriteBatch::Draw calls.
  • You call EndBatch.
  • Your dynamic buffer is big enough to hold 100 quads at a time.
  • You have not yet done any sprite batch processing so your insert point into the dynamic buffers is at the start, 0.

Process:

Need to get locks on your vertex and index buffers. When you take the lock, you need to figure out whether you want to lock it with the NOOVERWRITE or DISCARD flags. Since we are looking at an empty buffer (haven't inserted anything yet), you want to use NOOVERWRITE. You only want to lock with DISCARD when the buffer is full or almost full.

When you lock with NOOVERWRITE, you immediately get a pointer to the buffer, even if the GPU is currently pulling data from it to draw things. That's why you say NOOVERWRITE (don't overwrite anything you have previously put in there).

Fresh buffer, you are at the start of it, 100 quads will fit. Fill up the 100 quads worth of data.

Unlock the buffers.

DrawIndex on all the data in the dynamic buffer.

You still have 50 quads to draw.

Take another lock on the dynamic buffer, but this time use the DISCARD flag because you already filled it up. If the GPU is still drawing with the data you gave it, you will get a pointer to different memory.

Start back at the top (fresh buffer), add your 50 quads worth a data to it.

Unlock the buffers.

DrawIndex on the dynamic buffer.

Call Present, see all your quads via the 2 batches you sent.

For the next frame, I am not sure what the best practice is. In my implementation, I remember that I already drew 50 quads to the dynamic buffer and any quads I add next time I take the lock with NOOVERWRITE again and continue filling it up. However, since the quads have been drawn, you could probably make your life easier and use DISCARD always at the start of a new frame. I'm unsure on that one. You would want to be careful about doing that in BeginBatch, since you could technically call BeginBatch/EndBatch multiple times per frame if you wanted to. In that case calling NOOVERWRITE would be the better choice since it is in the same frame and probably still drawing your previous quads.

Instead of creating your vertex and index buffers in endBatch, you would create them once: Create them in the constructor (as dynamic buffers), free them in the destructor. I am not really sure if there is a "golden" size to pick, but the goal would be that you lock the buffers more often with NOOVERWRITE than DISCARD (discard being more expensive if the drivers need to allocate a new buffer for you).

So when I create the vertex and index buffers in the constructor, I want to create them with the max size even though I may or may not ever fill them up completely? I would also assume I would only want to memcopy the data I need then

E.g:


//Arrays that hold my data for the vertex and index buffers. This will allow for 1000 quads 
vertex vertices[4000];
short indices[6000];

//In constructor
batDevice->CreateVertexBuffer(sizeof(vertices), D3DUSAGE_WRITEONLY, CUSTOMFVF, D3DPOOL_MANAGED, &vBuffer, NULL);
batDevice->CreateIndexBuffer(sizeof(indices), D3DUSAGE_WRITEONLY, D3DFMT_INDEX16, D3DPOOL_MANAGED, &iBuffer, NULL);

//In the render call

//Fill / prepare the vertex buffer based on the actually amount of vertexs needed
vBuffer->Lock(0, 0, (void**) &pVoid, NULL);
memcpy(pVoid, vertices, vertCount * sizeof(vertex));
vBuffer->Unlock();

//Fill / prepare the index buffer based on the actually amount of indices needed
iBuffer->Lock(0, 0, (void**) &pVoid, NULL);
memcpy(pVoid, indices, idxBuffCount * sizeof(short));
iBuffer->Unlock();

You need another buffer of some sort to keep track of your Draw calls. You don't want to immediately send all your Draw calls to the GPU; you want to batch them up instead (SpriteBATCH). This buffer has nothing to do with the GPU, it is just a list you maintain in some way that is convenient for you. If you call Draw 200 times, this internal list would keep track of the data from those 200 calls. When you call your endBatch, that is when you want to worry about getting the data to the GPU.

When you call endBatch, if those 200 Draw calls didn't need to change any state, you might be able to get away with sending them to the GPU with a single draw call -- they would all have to use the same state and same image in that case (like a sprite font).

There are a couple of ways you could look at making the SpriteBatch. You could set up all the state external to the SpriteBatch, call BeginBatch, do all your draw calls, then call endBatch for each state change (you would be externally driving it this way), or you could make your SpriteBatch more complicated, such as giving your draw call different textures, and let your SpriteBatch worry about sorting out the details -- 5 different images to render, that's 5 state changes. Then when you EndBatch you can sort your list by texture, set up the gpu states, put all the vertices/indices in the dynamic buffer, call draw, then start processing the next image's quads, until you have sent all your Draw calls to the GPU.

I am not sure I follow you here. The only time I send data to the GPU is when endBatch is called. Currently endBatch is called only when I need to set a new texture or I actually hit the end of my batch (calling batcher.endBatch() in my main.cpp render).

Maybe you are talking about something like this? Where my SpriteBatcher::draw call only fills my arrays with data, instead of also checking for a texture swap. And my render call is the one that uses the draw tracking buffer to check if a texture swap is needed



void SpriteBatcher::draw(float x, float y, float width, float height, D3DCOLOR color, LPDIRECT3DTEXTURE9 texture)
{
    //set a texture for this quad
    drawData[i].texture = texture;
    i++;
    
    //Make a quad
    //V0
    vertices[vertCount].x = x;
    vertices[vertCount].y = y;
    vertices[vertCount].z = 1.0f;
    vertices[vertCount].rhw = 1.0f;
    vertices[vertCount].color = color;
    vertices[vertCount].u = 0.0f;
    vertices[vertCount].v = 0.0f;

    //V1
    vertices[vertCount + 1].x = x + width;
    vertices[vertCount + 1].y = y;
    vertices[vertCount + 1].z = 1.0f;
    vertices[vertCount + 1].rhw = 1.0f;
    vertices[vertCount + 1].color = color;
    vertices[vertCount + 1].u = 1.0f;
    vertices[vertCount + 1].v = 0.0f;

    //V2
    vertices[vertCount + 2].x = x + width;
    vertices[vertCount + 2].y = y + height;
    vertices[vertCount + 2].z = 1.0f;
    vertices[vertCount + 2].rhw = 1.0f;
    vertices[vertCount + 2].color = color;
    vertices[vertCount + 2].u = 1.0f;
    vertices[vertCount + 2].v = 1.0f;


    //V3
    vertices[vertCount + 3].x = x;
    vertices[vertCount + 3].y = y + height;
    vertices[vertCount + 3].z = 1.0f;
    vertices[vertCount + 3].rhw = 1.0f;
    vertices[vertCount + 3].color = color;
    vertices[vertCount + 3].u = 0.0f;
    vertices[vertCount + 3].v = 1.0f;

    //0,1,2, 2,3,0
    indices[idxBuffCount] = vertCount;
    indices[idxBuffCount + 1] = vertCount + 1;
    indices[idxBuffCount + 2] = vertCount + 2;
    indices[idxBuffCount + 3] = vertCount + 3;
    indices[idxBuffCount + 4] = vertCount;
    indices[idxBuffCount + 5] = vertCount +2;

    //inc the number of shapes to draw (inc by 2 cause of 2 triangles)
    //inc the vert index by 4
    numShapes += 2;
    vertCount += 4;
    idxBuffCount += 6;

}


void SpriteBatcher::render()
{
	//Render everything that needs to be drawn
	
	//Fill / prepare the vertex buffer
	vBuffer->Lock(0, 0, (void**) &pVoid, NULL);
	memcpy(pVoid, vertices, vertCount * sizeof(vertex));
	vBuffer->Unlock();

	//Fill / prepare the index buffer
	iBuffer->Lock(0, 0, (void**) &pVoid, NULL);
	memcpy(pVoid, indices, idxBuffCount * sizeof(short));
	iBuffer->Unlock();

	//Prepare to draw the scene

	//Draw code
	batDevice->SetStreamSource(0, vBuffer, 0, sizeof(vertex));
	batDevice->SetIndices(iBuffer);

        //Something along these lines to send everything in one call
        if(drawData[i].texture != drawData[i+1].texture)
	          batDevice->SetTexture(0, drawData.texture);
        i++;

        //Send everything to the GPU
	batDevice->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, vertCount, 0, numShapes);
	
}

For the dynamic buffers, you copy the data in like you are currently doing, but the buffer is always there and you need to make sure you insert new data for drawing after the last data you inserted (NOOVERWRITE). In your lock calls, you didn't use any flags. For a dynamic buffer the flag needs to be the NOOVERWRITE or DISCARD flag.

Scenario:

  • You queue up 150 quads to draw via your SpriteBatch::Draw calls.
  • You call EndBatch.
  • Your dynamic buffer is big enough to hold 100 quads at a time.
  • You have not yet done any sprite batch processing so your insert point into the dynamic buffers is at the start, 0.

Process:

Need to get locks on your vertex and index buffers. When you take the lock, you need to figure out whether you want to lock it with the NOOVERWRITE or DISCARD flags. Since we are looking at an empty buffer (haven't inserted anything yet), you want to use NOOVERWRITE. You only want to lock with DISCARD when the buffer is full or almost full.

When you lock with NOOVERWRITE, you immediately get a pointer to the buffer, even if the GPU is currently pulling data from it to draw things. That's why you say NOOVERWRITE (don't overwrite anything you have previously put in there).

Fresh buffer, you are at the start of it, 100 quads will fit. Fill up the 100 quads worth of data.

Unlock the buffers.

DrawIndex on all the data in the dynamic buffer.

You still have 50 quads to draw.

Take another lock on the dynamic buffer, but this time use the DISCARD flag because you already filled it up. If the GPU is still drawing with the data you gave it, you will get a pointer to different memory.

Start back at the top (fresh buffer), add your 50 quads worth a data to it.

Unlock the buffers.

DrawIndex on the dynamic buffer.

Call Present, see all your quads via the 2 batches you sent.

For the next frame, I am not sure what the best practice is. In my implementation, I remember that I already drew 50 quads to the dynamic buffer and any quads I add next time I take the lock with NOOVERWRITE again and continue filling it up. However, since the quads have been drawn, you could probably make your life easier and use DISCARD always at the start of a new frame. I'm unsure on that one. You would want to be careful about doing that in BeginBatch, since you could technically call BeginBatch/EndBatch multiple times per frame if you wanted to. In that case calling NOOVERWRITE would be the better choice since it is in the same frame and probably still drawing your previous quads.

This the main part where things go over my head and things kind of fall apart. I'm assuming all of this would be in my SpriteBatcher::render function, I understand that we need to lock the buffers and then use the memcopy function to fill it with our data and etc. But what I dont understand is when / how we tell it that we have 50 more quads left to draw

So when I create the vertex and index buffers in the constructor, I want to create them with the max size even though I may or may not ever fill them up completely? I would also assume I would only want to memcopy the data I need then

Yes, the goal being to find some balance. Too small and you have to call DISCARD too often, too big and you are just wasting a bunch of memory.

Your example of creating them in the constructor still didn't create them as dynamic buffers though. Look at the docs for CreateVertexBuffer and at the usage flags:

http://msdn.microsoft.com/en-us/library/windows/desktop/bb147263(v=vs.85).aspx#Using_Dynamic_Vertex_and_Index_Buffers

http://msdn.microsoft.com/en-us/library/windows/desktop/bb174364(v=vs.85).aspx

http://msdn.microsoft.com/en-us/library/windows/desktop/bb172625(v=vs.85).aspx

Since you know you'll have data changing frequently (per frame), you want to do it in the most efficient way you can. Creating and destroying static buffers each frame is going to be spendy. Dynamic buffers are the solution to frequently changing data. They are designed to support frequent updates.

That's the biggest part to wrap your mind around. Most of the rest of what I was saying are just implementation flavors -- do whatever you need to make your life easier. Understand the dynamic buffers and their pattern of usage, then wrap whatever code you need around that in your SpriteBatch.

Maybe you are talking about something like this? Where my SpriteBatcher::draw call only fills my arrays with data, instead of also checking for a texture swap. And my render call is the one that uses the draw tracking buffer to check if a texture swap is needed

You are in the right track with your sample code. However, I would stay away from using a fixed size array for queuing up the data. What you want to be able to do is take an unknown number of SpriteBatch::Draw calls, queue them up, then when you call SpriteBatch::EndBatch, you want to fill up the dynamic buffer as many times as it takes to draw everything in your internal queue. If your internal size is fixed what happens if Draw is called more times than you have storage for?

Have you looked at DirectXTK? http://directxtk.codeplex.com/

It actually has a SpriteBatch in it, which might save you a lot of time trying to create your own. The source is there for you to study, or adapt into your own creations. I haven't looked at the source, but I'm pretty sure it follows a similar pattern with dynamic buffers -- based on Shawn's writeup of DISCARD/NO_OVERWRITE dynamic buffers in XNA : http://blogs.msdn.com/b/shawnhar/archive/2010/07/07/setdataoptions-nooverwrite-versus-discard.aspx

This topic is closed to new replies.

Advertisement