Jump to content

  • Log In with Google      Sign In   
  • Create Account

Odd rendering issue; need help


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
17 replies to this topic

#1 noodleBowl   Members   -  Reputation: 186

Like
0Likes
Like

Posted 22 September 2013 - 01:06 AM

I am having a really strange issue with my render function. I currently created a spritebatcher that renders great until you go over the array size for the vertex buffer.

 

I currently have it set up where if you draw more primitives than the array can handle it will call the endBatch function. Which sets up everything for rendering, renders everything in the vertex buffer, and then resets the counts on the rendering items (number of shapes to draw, number of vertices, etc).

 

My issue is that if the max vertex count is met perfectly everything renders just fine. But if I go over the max vertex count, all of my drawn objects flicker or flash extremely fast. Like one only sprite is being drawn and it is warping to all of the locations.

 

Can someone please help me to figure out why this is happening. Currently my max on my vertex array is 4 (debug purposes) and I am attempting to draw 3 sprites. If you comment out 2 of the draw calls in the render function in the main.cpp file the sprite renders fine (no flickering). Otherwise my issue above happens

 

[+]----------------------------------[ Original Problem Solved! ]

 

Current Code -

main.cpp : http://pastebin.com/m7N7Y2e0

SpriteBatcher.h : http://pastebin.com/wGMJ8Wv0

SpriteBatcher.cpp : http://pastebin.com/SPdm1yfX

 

[+]----------------------------------[ Current topic making a better batcher! ]

 

You are kinda on the right track with the sprite batch, but there are some concepts that have been missed.  Although I haven't done this in DX9, I have in DX10 and DX11 and conceptually I would imagine it is the same in DX9.

 

You don't want your sprite batch calling Present.  It currently is doing this.  This definitely will cause some flickering since you will only partially draw your sprites in each frame.

 

Should be more like:

  • Clear back buffer.
  • Draw sprites, as many as you want.
  • Present (from somewhere outside the spritebatch when you know nothing else will be rendered).

In your SpriteBatch you should create dynamic vertex and index buffers.  The key here is to make them dynamic.  You are currently creating new buffers each time endBatch is called which I would imagine is slow if you do it a lot.  I didn't see them cleaned up either, but I didn't look through all of the code.

 

Your sprite batch needs a buffer to track your queued up quads and also needs to keep track of the last position your wrote to in the dynamic buffers.  When your local buffer is full, you need to get a lock on the dynamic buffers, write your new vertex and index data to wherever you left off, release the lock, then DrawIndex (whatever the DX9 equivalent is) giving the offset into your dynamic buffer that you just wrote new data to.

 

When you lock the dynamic buffers you want to use lock flags to let the api know what your intentions are.  If you are still filling up the dynamic buffer use D3DLOCK_NOOVERWRITE and you are promising you are not going to overwrite any data that has already been submitted and might currently be drawing.  Technically, I think you CAN overwrite if you want, but you'll mess things up and probably see it.  Pretty sure if you don't lock it with this flag you wait until the gpu is done drawing the contents before you get the lock (bad/slow). 

 

If you are towards the end of your dynamic buffer, you want to specify the DISCARD flag.  This returns you a pointer that you can start filling from the top again.  If the gpu is still processing data you already submitted you get a new pointer here.  Point is, you don't have to care whether the gpu is done with it or not, if you say DISCARD you get a pointer that is valid to write to from the top.

 

Process is something like this:

  • Take lock on buffers:
    • Full or Almost full, use DISCARD
    • Plenty of room left, use NOOVERWRITE
  • Add vertex/index data to buffers.
  • Release lock.
  • DrawIndexed

You probably want your own list of quads in your sprite batch rather than directly writing them to the dynamic buffers.  This lets you sort later on down the road if you need to.  Fill up internal list of quads via your draw calls, call EndBatch, sort your list by image (or whatever causes a state change), fill dynamic buffers (possibly multiple lock/fill/unlock/DrawIndex).

 

You will need to consider how you sort, or if you sort, quads if you batch them up like I mentioned.  If your quads are drawn in an order-dependent manner (some have to be on top of others), then sorting by image alone isn't enough.  You'll have to decide what forces your sprite batch to submit new quads.


Edited by noodleBowl, 26 September 2013 - 09:00 PM.


Sponsor:

#2 PunCrathod   Members   -  Reputation: 278

Like
2Likes
Like

Posted 22 September 2013 - 04:47 AM

I am not sure what the problem you are trying to solve really is. But a quick look through your code and I can see what causes the odd behaviour you are seeing. When you go over the buffer limit you stop and clear your screen then render your sprites to the screen. This will cause flickering because you cleared the screen and then render a few sprites and then clear the screen again and then render a few sprites more.


What I don't get is why you need to render only a few sprites at a time in the first place. Modern machines(even mobile ones) have more than enough memory and power to render all the sprites at once from a single vertexbuffer as long as you are not drawing many thousands of them. If you have some other reason to batch your rendering like this then the only solution is to not clear the screen every time you render a few sprites.


Clear only when you need to. Wich is never if you have somekind of background that fills the entire screen that you render at the beginning of each update. And by update I don't mean when you render a few sprites but when you render all of them.

#3 noodleBowl   Members   -  Reputation: 186

Like
0Likes
Like

Posted 22 September 2013 - 08:21 AM

The idea is my batcher is that the max number of sprites I would render at one time would be very high eg 10000+ . It is set to cap out at 4 right now to find any issues like the odd flickering one I'm having.

 

I definitely agree with you that the clear screen function plays apart in the issue, but it is not the sole cause of the problem. If I move the clear screen function outside and place it into the render function in the main.cpp it only "fixes" one sprite.

 

Now, I think it has something to do with the present function being placed into the endBatch function of the spritebatcher. If I move the present function out into render function of the main.cpp the flicker is gone.

 

But is this the correct thing to do? Also does this mean I should move the beginScene and endScene out of the sprite batcher's render call? I have heard that having multiple beginScene / endScene can have performance impacts



#4 PunCrathod   Members   -  Reputation: 278

Like
0Likes
Like

Posted 22 September 2013 - 09:32 AM

You are right that the present function call should also be outside the endBatch. When rendering stuff you first clear screen. Then give the gpu everything it needs to render the whole frame. Then tell it to present the frame. Repeat.


But the thing is you are not going to get any performance boost by batching the rendering of the sprites. Batching rendering will infact hurt your performance. The gpu is designed to render large amounts of data in one go. The only reasons anyone ever would want to batch rendering is if the application is not meant to run realtime and they wish to do other stuff in the same thread between the batches or if you are running out of video memory. Wich is unlikely to happen in most cases and when you are running out of video memory you propably wont have enough processingpower to render in realtime anyway.


For example you have 7 floats for each vertex and 4 vertexes and 6 indexes per sprite. That totals at 136 bytes per sprite. Now if your gpu has 512 megabytes of memory it would take near 4 million sprites to fill that memory. And most new graphics cards nowdays have four times or more memory than that.


If you have memory problems like trying to allocate too much static memory in the header file. Then you should look into how to dynamically allocate memory as the limit for dynamically allocated memory is several magnitudes higher. Especially if you are building a 64bit application then dynamic memory is only limited by the amount of ram you have.


Now batching is a really usefull trick for programmers but when you are rendering stuff to the screen you should try to minimize the amount of times you send data or instructions for the gpu. This means that for maximum performance you should draw all your sprites in a single batch if at all possible.

#5 noodleBowl   Members   -  Reputation: 186

Like
0Likes
Like

Posted 22 September 2013 - 10:43 AM

Currently my main.cpp render method looks like

//Draw the things that need to be drawn
void render()
{

	//Clear the screen
	device->Clear(0, NULL, D3DCLEAR_TARGET, D3DCOLOR_XRGB(0, 40, 100), 1.0f, 0);
	device->BeginScene();

	//Draw
	batcher.beginBatch();
	batcher.draw(50.0f, 50.0f, 64.0f, 64.0f, D3DCOLOR_XRGB(0,255,255), tex);
	batcher.draw(250.0f, 50.0f, 64.0f, 64.0f, D3DCOLOR_XRGB(0,0,255), tex);
	batcher.draw(200.0f, 200.0f, 64.0f, 64.0f, D3DCOLOR_XRGB(0,0,255), tex);
	batcher.endBatch();

	device->EndScene();
	device->Present(NULL, NULL, NULL, NULL);
}

Where my render method in my batcher looks like

void SpriteBatcher::render()
{
	//Render everything that needs to be drawn
	
	//Fill / prepare the vertex buffer
	vBuffer->Lock(0, 0, (void**) &pVoid, NULL);
	memcpy(pVoid, vertices, vertCount * sizeof(vertex));
	vBuffer->Unlock();

	//Fill / prepare the index buffer
	iBuffer->Lock(0, 0, (void**) &pVoid, NULL);
	memcpy(pVoid, indices, idxBuffCount * sizeof(short));
	iBuffer->Unlock();


	//Draw code
	batDevice->SetStreamSource(0, vBuffer, 0, sizeof(vertex));
	batDevice->SetIndices(iBuffer);
	
        //Change to only call when we need to set a new texture
	batDevice->SetTexture(0, currentTexture);

	batDevice->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, vertCount, 0, numShapes);
	
}

You're exactly right, I want the min number of render calls to keep top performance. I understand I want to send as much data to the GPU as I can at one time, but I am still unsure how it hurts my performance if the batch number is ridiculously high. Ideally we would want the amount of render calls to always be at 1 and I assume I would only want to render when

 

1. We need to set a new texture to use, because texture swapping is expensive

2. We are at the max amount of vertexes the batcher "can handle" so we do not crash the program due to an array out of bounds issue

 

Now, if the performance issue comes from the max size of the array, then I am not sure how to fix this. My first choice would be to use a vector of Vertex structs. But I am uncertain how to fill a vector of this type and load its information into the vertex buffer. Even still I would assume we would want a limit or max out to make sure we do not overload the GPU's memory



#6 Myiasis   Members   -  Reputation: 211

Like
1Likes
Like

Posted 26 September 2013 - 01:58 AM

You are kinda on the right track with the sprite batch, but there are some concepts that have been missed.  Although I haven't done this in DX9, I have in DX10 and DX11 and conceptually I would imagine it is the same in DX9.

 

You don't want your sprite batch calling Present.  It currently is doing this.  This definitely will cause some flickering since you will only partially draw your sprites in each frame.

 

Should be more like:

  • Clear back buffer.
  • Draw sprites, as many as you want.
  • Present (from somewhere outside the spritebatch when you know nothing else will be rendered).

In your SpriteBatch you should create dynamic vertex and index buffers.  The key here is to make them dynamic.  You are currently creating new buffers each time endBatch is called which I would imagine is slow if you do it a lot.  I didn't see them cleaned up either, but I didn't look through all of the code.

 

Your sprite batch needs a buffer to track your queued up quads and also needs to keep track of the last position your wrote to in the dynamic buffers.  When your local buffer is full, you need to get a lock on the dynamic buffers, write your new vertex and index data to wherever you left off, release the lock, then DrawIndex (whatever the DX9 equivalent is) giving the offset into your dynamic buffer that you just wrote new data to.

 

When you lock the dynamic buffers you want to use lock flags to let the api know what your intentions are.  If you are still filling up the dynamic buffer use D3DLOCK_NOOVERWRITE and you are promising you are not going to overwrite any data that has already been submitted and might currently be drawing.  Technically, I think you CAN overwrite if you want, but you'll mess things up and probably see it.  Pretty sure if you don't lock it with this flag you wait until the gpu is done drawing the contents before you get the lock (bad/slow). 

 

If you are towards the end of your dynamic buffer, you want to specify the DISCARD flag.  This returns you a pointer that you can start filling from the top again.  If the gpu is still processing data you already submitted you get a new pointer here.  Point is, you don't have to care whether the gpu is done with it or not, if you say DISCARD you get a pointer that is valid to write to from the top.

 

Process is something like this:

  • Take lock on buffers:
    • Full or Almost full, use DISCARD
    • Plenty of room left, use NOOVERWRITE
  • Add vertex/index data to buffers.
  • Release lock.
  • DrawIndexed

You probably want your own list of quads in your sprite batch rather than directly writing them to the dynamic buffers.  This lets you sort later on down the road if you need to.  Fill up internal list of quads via your draw calls, call EndBatch, sort your list by image (or whatever causes a state change), fill dynamic buffers (possibly multiple lock/fill/unlock/DrawIndex).

 

You will need to consider how you sort, or if you sort, quads if you batch them up like I mentioned.  If your quads are drawn in an order-dependent manner (some have to be on top of others), then sorting by image alone isn't enough.  You'll have to decide what forces your sprite batch to submit new quads.

 



#7 noodleBowl   Members   -  Reputation: 186

Like
0Likes
Like

Posted 26 September 2013 - 09:21 PM

You don't want your sprite batch calling Present.  It currently is doing this.  This definitely will cause some flickering since you will only partially draw your sprites in each frame.

 

Should be more like:

  • Clear back buffer.
  • Draw sprites, as many as you want.
  • Present (from somewhere outside the spritebatch when you know nothing else will be rendered).

 

This is fixed

 

In your SpriteBatch you should create dynamic vertex and index buffers.  The key here is to make them dynamic.  You are currently creating new buffers each time endBatch is called which I would imagine is slow if you do it a lot.  I didn't see them cleaned up either, but I didn't look through all of the code.

 

Your sprite batch needs a buffer to track your queued up quads and also needs to keep track of the last position your wrote to in the dynamic buffers.  When your local buffer is full, you need to get a lock on the dynamic buffers, write your new vertex and index data to wherever you left off, release the lock, then DrawIndex (whatever the DX9 equivalent is) giving the offset into your dynamic buffer that you just wrote new data to.

 

When you lock the dynamic buffers you want to use lock flags to let the api know what your intentions are.  If you are still filling up the dynamic buffer use D3DLOCK_NOOVERWRITE and you are promising you are not going to overwrite any data that has already been submitted and might currently be drawing.  Technically, I think you CAN overwrite if you want, but you'll mess things up and probably see it.  Pretty sure if you don't lock it with this flag you wait until the gpu is done drawing the contents before you get the lock (bad/slow). 

 

If you are towards the end of your dynamic buffer, you want to specify the DISCARD flag.  This returns you a pointer that you can start filling from the top again.  If the gpu is still processing data you already submitted you get a new pointer here.  Point is, you don't have to care whether the gpu is done with it or not, if you say DISCARD you get a pointer that is valid to write to from the top.

 

Process is something like this:

  • Take lock on buffers:
    • Full or Almost full, use DISCARD
    • Plenty of room left, use NOOVERWRITE
  • Add vertex/index data to buffers.
  • Release lock.
  • DrawIndexed

You probably want your own list of quads in your sprite batch rather than directly writing them to the dynamic buffers.  This lets you sort later on down the road if you need to.  Fill up internal list of quads via your draw calls, call EndBatch, sort your list by image (or whatever causes a state change), fill dynamic buffers (possibly multiple lock/fill/unlock/DrawIndex).

 

You will need to consider how you sort, or if you sort, quads if you batch them up like I mentioned.  If your quads are drawn in an order-dependent manner (some have to be on top of others), then sorting by image alone isn't enough.  You'll have to decide what forces your sprite batch to submit new quads.

 

This is where you kinda start going over my head. I have never done direct x work, so I'm not sure how to do this.

 

You're right about the new buffers each frame, my endBatch call is where I make a new vertex / index buffer based on the amount of quads I need to draw. Right now, I'm using vertCount and idxBuffCount to keep track of how many vertexs / indices I need to render, which can also tell me where I left off in each array. These values are only reset when I call endBatch. Also the buffers created are the exact size I need since I base them off of the vertCount and idxBuffCount. As for clean up, the buffers are only cleaned up / released in deconstructor of the SpriteBatcher.

 

Currently when my vertex array max is met, I swap textures, or call endBatch everything is sent to the GPU. My render call is the only time I lock the vertex and index buffers. Where, I then use memcopy to pump the buffers full of my vertex / index data stored in my arrays. As for locking the buffers any other way and filling them without memcopy's use (eg placing the locks in the draw call of the sprite batcher and then filling them) I'm not sure how to do that.

 

My current endBatch call

void SpriteBatcher::endBatch()
{
	//Get everything ready for the render
	if(vertCount > 0)
	{
		batDevice->CreateVertexBuffer(vertCount * sizeof(vertex), D3DUSAGE_WRITEONLY, CUSTOMFVF, D3DPOOL_MANAGED, &vBuffer, NULL);
		batDevice->CreateIndexBuffer(idxBuffCount * sizeof(short), D3DUSAGE_WRITEONLY, D3DFMT_INDEX16, D3DPOOL_MANAGED, &iBuffer, NULL);
		render();
		resetCounts();
		renderCount++;
	}
	
	std::cout<<renderCount<<std::endl;
}

My current render call inside of SpriteBatcher

void SpriteBatcher::render()
{
	//Render everything that needs to be drawn

	#pragma region Vertex and Index buffers
	
	//Fill / prepare the vertex buffer
	vBuffer->Lock(0, 0, (void**) &pVoid, NULL);
	memcpy(pVoid, vertices, vertCount * sizeof(vertex));
	vBuffer->Unlock();

	//Fill / prepare the index buffer
	iBuffer->Lock(0, 0, (void**) &pVoid, NULL);
	memcpy(pVoid, indices, idxBuffCount * sizeof(short));
	iBuffer->Unlock();

	#pragma endregion

	//Prepare to draw the scene

	//Draw code
	batDevice->SetStreamSource(0, vBuffer, 0, sizeof(vertex));
	batDevice->SetIndices(iBuffer);
	
	//std::cout<<"Texture set: "<<currentTexture<<std::endl;
	batDevice->SetTexture(0, currentTexture);

	batDevice->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, vertCount, 0, numShapes);
	
}

Full current Code -

main.cpp : http://pastebin.com/m7N7Y2e0

SpriteBatcher.h : http://pastebin.com/wGMJ8Wv0

SpriteBatcher.cpp : http://pastebin.com/SPdm1yfX


Edited by noodleBowl, 26 September 2013 - 09:29 PM.


#8 Myiasis   Members   -  Reputation: 211

Like
1Likes
Like

Posted 27 September 2013 - 12:04 AM

Instead of creating your vertex and index buffers in endBatch, you would create them once:  Create them in the constructor (as dynamic buffers), free them in the destructor.  I am not really sure if there is a "golden" size to pick, but the goal would be that you lock the buffers more often with NOOVERWRITE than DISCARD (discard being more expensive if the drivers need to allocate a new buffer for you).

 

You need another buffer of some sort to keep track of your Draw calls.  You don't want to immediately send all your Draw calls to the GPU; you want to batch them up instead (SpriteBATCH).  This buffer has nothing to do with the GPU, it is just a list you maintain in some way that is convenient for you.  If you call Draw 200 times, this internal list would keep track of the data from those 200 calls.  When you call your endBatch, that is when you want to worry about getting the data to the GPU.

 

When you call endBatch, if those 200 Draw calls didn't need to change any state, you might be able to get away with sending them to the GPU with a single draw call -- they would all have to use the same state and same image in that case (like a sprite font).

 

There are a couple of ways you could look at making the SpriteBatch.  You could set up all the state external to the SpriteBatch, call BeginBatch, do all your draw calls, then call endBatch for each state change (you would be externally driving it this way), or you could make your SpriteBatch more complicated, such as giving your draw call different textures, and let your SpriteBatch worry about sorting out the details -- 5 different images to render, that's 5 state changes.  Then when you EndBatch you can sort your list by texture, set up the gpu states, put all the vertices/indices in the dynamic buffer, call draw, then start processing the next image's quads, until you have sent all your Draw calls to the GPU.

 

For the dynamic buffers, you copy the data in like you are currently doing, but the buffer is always there and you need to make sure you insert new data for drawing after the last data you inserted (NOOVERWRITE).  In your lock calls, you didn't use any flags.  For a dynamic buffer the flag needs to be the NOOVERWRITE or DISCARD flag.

 

Scenario:

  • You queue up 150 quads to draw via your SpriteBatch::Draw calls.
  • You call EndBatch.
  • Your dynamic buffer is big enough to hold 100 quads at a time.
  • You have not yet done any sprite batch processing so your insert point into the dynamic buffers is at the start, 0.

Process:

Need to get locks on your vertex and index buffers.  When you take the lock, you need to figure out whether you want to lock it with the NOOVERWRITE or DISCARD flags.  Since we are looking at an empty buffer (haven't inserted anything yet), you want to use NOOVERWRITE.  You only want to lock with DISCARD when the buffer is full or almost full.

 

When you lock with NOOVERWRITE, you immediately get a pointer to the buffer, even if the GPU is currently pulling data from it to draw things.  That's why you say NOOVERWRITE (don't overwrite anything you have previously put in there).

 

Fresh buffer, you are at the start of it, 100 quads will fit.  Fill up the 100 quads worth of data.

 

Unlock the buffers.

 

DrawIndex on all the data in the dynamic buffer.

 

You still have 50 quads to draw.

 

Take another lock on the dynamic buffer, but this time use the DISCARD flag because you already filled it up.  If the GPU is still drawing with the data you gave it, you will get a pointer to different memory.

 

Start back at the top (fresh buffer), add your 50 quads worth a data to it.

 

Unlock the buffers.

 

DrawIndex on the dynamic buffer.

 

Call Present, see all your quads via the 2 batches you sent.

 

For the next frame, I am not sure what the best practice is.  In my implementation, I remember that I already drew 50 quads to the dynamic buffer and any quads I add next time I take the lock with NOOVERWRITE again and continue filling it up.  However, since the quads have been drawn, you could probably make your life easier and use DISCARD always at the start of a new frame.  I'm unsure on that one.  You would want to be careful about doing that in BeginBatch, since you could technically call BeginBatch/EndBatch multiple times per frame if you wanted to.  In that case calling NOOVERWRITE would be the better choice since it is in the same frame and probably still drawing your previous quads.



#9 noodleBowl   Members   -  Reputation: 186

Like
0Likes
Like

Posted 29 September 2013 - 08:52 PM

Instead of creating your vertex and index buffers in endBatch, you would create them once:  Create them in the constructor (as dynamic buffers), free them in the destructor.  I am not really sure if there is a "golden" size to pick, but the goal would be that you lock the buffers more often with NOOVERWRITE than DISCARD (discard being more expensive if the drivers need to allocate a new buffer for you).

 

So when I create the vertex and index buffers in the constructor, I want to create them with the max size even though I may or may not ever fill them up completely? I would also assume I would only want to memcopy the data I need then

E.g:

//Arrays that hold my data for the vertex and index buffers. This will allow for 1000 quads 
vertex vertices[4000];
short indices[6000];

//In constructor
batDevice->CreateVertexBuffer(sizeof(vertices), D3DUSAGE_WRITEONLY, CUSTOMFVF, D3DPOOL_MANAGED, &vBuffer, NULL);
batDevice->CreateIndexBuffer(sizeof(indices), D3DUSAGE_WRITEONLY, D3DFMT_INDEX16, D3DPOOL_MANAGED, &iBuffer, NULL);

//In the render call

//Fill / prepare the vertex buffer based on the actually amount of vertexs needed
vBuffer->Lock(0, 0, (void**) &pVoid, NULL);
memcpy(pVoid, vertices, vertCount * sizeof(vertex));
vBuffer->Unlock();

//Fill / prepare the index buffer based on the actually amount of indices needed
iBuffer->Lock(0, 0, (void**) &pVoid, NULL);
memcpy(pVoid, indices, idxBuffCount * sizeof(short));
iBuffer->Unlock();

 

You need another buffer of some sort to keep track of your Draw calls.  You don't want to immediately send all your Draw calls to the GPU; you want to batch them up instead (SpriteBATCH).  This buffer has nothing to do with the GPU, it is just a list you maintain in some way that is convenient for you.  If you call Draw 200 times, this internal list would keep track of the data from those 200 calls.  When you call your endBatch, that is when you want to worry about getting the data to the GPU.

 

When you call endBatch, if those 200 Draw calls didn't need to change any state, you might be able to get away with sending them to the GPU with a single draw call -- they would all have to use the same state and same image in that case (like a sprite font).
 

There are a couple of ways you could look at making the SpriteBatch.  You could set up all the state external to the SpriteBatch, call BeginBatch, do all your draw calls, then call endBatch for each state change (you would be externally driving it this way), or you could make your SpriteBatch more complicated, such as giving your draw call different textures, and let your SpriteBatch worry about sorting out the details -- 5 different images to render, that's 5 state changes.  Then when you EndBatch you can sort your list by texture, set up the gpu states, put all the vertices/indices in the dynamic buffer, call draw, then start processing the next image's quads, until you have sent all your Draw calls to the GPU.

 

 

I am not sure I follow you here. The only time I send data to the GPU is when endBatch is called. Currently endBatch is called only when I need to set a new texture or I actually hit the end of my batch (calling batcher.endBatch() in my main.cpp render).

Maybe you are talking about something like this? Where my SpriteBatcher::draw call only fills my arrays with data, instead of also checking for a texture swap. And my render call is the one that uses the draw tracking buffer to check if a texture swap is needed


void SpriteBatcher::draw(float x, float y, float width, float height, D3DCOLOR color, LPDIRECT3DTEXTURE9 texture)
{
    //set a texture for this quad
    drawData[i].texture = texture;
    i++;
    
    //Make a quad
    //V0
    vertices[vertCount].x = x;
    vertices[vertCount].y = y;
    vertices[vertCount].z = 1.0f;
    vertices[vertCount].rhw = 1.0f;
    vertices[vertCount].color = color;
    vertices[vertCount].u = 0.0f;
    vertices[vertCount].v = 0.0f;

    //V1
    vertices[vertCount + 1].x = x + width;
    vertices[vertCount + 1].y = y;
    vertices[vertCount + 1].z = 1.0f;
    vertices[vertCount + 1].rhw = 1.0f;
    vertices[vertCount + 1].color = color;
    vertices[vertCount + 1].u = 1.0f;
    vertices[vertCount + 1].v = 0.0f;

    //V2
    vertices[vertCount + 2].x = x + width;
    vertices[vertCount + 2].y = y + height;
    vertices[vertCount + 2].z = 1.0f;
    vertices[vertCount + 2].rhw = 1.0f;
    vertices[vertCount + 2].color = color;
    vertices[vertCount + 2].u = 1.0f;
    vertices[vertCount + 2].v = 1.0f;


    //V3
    vertices[vertCount + 3].x = x;
    vertices[vertCount + 3].y = y + height;
    vertices[vertCount + 3].z = 1.0f;
    vertices[vertCount + 3].rhw = 1.0f;
    vertices[vertCount + 3].color = color;
    vertices[vertCount + 3].u = 0.0f;
    vertices[vertCount + 3].v = 1.0f;

    //0,1,2, 2,3,0
    indices[idxBuffCount] = vertCount;
    indices[idxBuffCount + 1] = vertCount + 1;
    indices[idxBuffCount + 2] = vertCount + 2;
    indices[idxBuffCount + 3] = vertCount + 3;
    indices[idxBuffCount + 4] = vertCount;
    indices[idxBuffCount + 5] = vertCount +2;

    //inc the number of shapes to draw (inc by 2 cause of 2 triangles)
    //inc the vert index by 4
    numShapes += 2;
    vertCount += 4;
    idxBuffCount += 6;

}


void SpriteBatcher::render()
{
	//Render everything that needs to be drawn
	
	//Fill / prepare the vertex buffer
	vBuffer->Lock(0, 0, (void**) &pVoid, NULL);
	memcpy(pVoid, vertices, vertCount * sizeof(vertex));
	vBuffer->Unlock();

	//Fill / prepare the index buffer
	iBuffer->Lock(0, 0, (void**) &pVoid, NULL);
	memcpy(pVoid, indices, idxBuffCount * sizeof(short));
	iBuffer->Unlock();

	//Prepare to draw the scene

	//Draw code
	batDevice->SetStreamSource(0, vBuffer, 0, sizeof(vertex));
	batDevice->SetIndices(iBuffer);

        //Something along these lines to send everything in one call
        if(drawData[i].texture != drawData[i+1].texture)
	          batDevice->SetTexture(0, drawData.texture);
        i++;

        //Send everything to the GPU
	batDevice->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, vertCount, 0, numShapes);
	
}

 

For the dynamic buffers, you copy the data in like you are currently doing, but the buffer is always there and you need to make sure you insert new data for drawing after the last data you inserted (NOOVERWRITE).  In your lock calls, you didn't use any flags.  For a dynamic buffer the flag needs to be the NOOVERWRITE or DISCARD flag.

 

Scenario:

  • You queue up 150 quads to draw via your SpriteBatch::Draw calls.
  • You call EndBatch.
  • Your dynamic buffer is big enough to hold 100 quads at a time.
  • You have not yet done any sprite batch processing so your insert point into the dynamic buffers is at the start, 0.

Process:

Need to get locks on your vertex and index buffers.  When you take the lock, you need to figure out whether you want to lock it with the NOOVERWRITE or DISCARD flags.  Since we are looking at an empty buffer (haven't inserted anything yet), you want to use NOOVERWRITE.  You only want to lock with DISCARD when the buffer is full or almost full.

 

When you lock with NOOVERWRITE, you immediately get a pointer to the buffer, even if the GPU is currently pulling data from it to draw things.  That's why you say NOOVERWRITE (don't overwrite anything you have previously put in there).

 

Fresh buffer, you are at the start of it, 100 quads will fit.  Fill up the 100 quads worth of data.

 

Unlock the buffers.

 

DrawIndex on all the data in the dynamic buffer.

 

You still have 50 quads to draw.

 

Take another lock on the dynamic buffer, but this time use the DISCARD flag because you already filled it up.  If the GPU is still drawing with the data you gave it, you will get a pointer to different memory.

 

Start back at the top (fresh buffer), add your 50 quads worth a data to it.

 

Unlock the buffers.

 

DrawIndex on the dynamic buffer.

 

Call Present, see all your quads via the 2 batches you sent.

 

For the next frame, I am not sure what the best practice is.  In my implementation, I remember that I already drew 50 quads to the dynamic buffer and any quads I add next time I take the lock with NOOVERWRITE again and continue filling it up.  However, since the quads have been drawn, you could probably make your life easier and use DISCARD always at the start of a new frame.  I'm unsure on that one.  You would want to be careful about doing that in BeginBatch, since you could technically call BeginBatch/EndBatch multiple times per frame if you wanted to.  In that case calling NOOVERWRITE would be the better choice since it is in the same frame and probably still drawing your previous quads.

 

This the main part where things go over my head and things kind of fall apart. I'm assuming all of this would be in my SpriteBatcher::render function, I understand that we need to lock the buffers and then use the memcopy function to fill it with our data and etc. But what I dont understand is when / how we tell it that we have 50 more quads left to draw


Edited by noodleBowl, 29 September 2013 - 08:54 PM.


#10 Myiasis   Members   -  Reputation: 211

Like
1Likes
Like

Posted 29 September 2013 - 11:58 PM

 

So when I create the vertex and index buffers in the constructor, I want to create them with the max size even though I may or may not ever fill them up completely? I would also assume I would only want to memcopy the data I need then

 

Yes, the goal being to find some balance.  Too small and you have to call DISCARD too often, too big and you are just wasting a bunch of memory.

 

Your example of creating them in the constructor still didn't create them as dynamic buffers though.  Look at the docs for CreateVertexBuffer and at the usage flags:

 

http://msdn.microsoft.com/en-us/library/windows/desktop/bb147263(v=vs.85).aspx#Using_Dynamic_Vertex_and_Index_Buffers

http://msdn.microsoft.com/en-us/library/windows/desktop/bb174364(v=vs.85).aspx

http://msdn.microsoft.com/en-us/library/windows/desktop/bb172625(v=vs.85).aspx

 

Since you know you'll have data changing frequently (per frame), you want to do it in the most efficient way you can.  Creating and destroying static buffers each frame is going to be spendy.  Dynamic buffers are the solution to frequently changing data.  They are designed to support frequent updates.

 

That's the biggest part to wrap your mind around.  Most of the rest of what I was saying are just implementation flavors -- do whatever you need to make your life easier.  Understand the dynamic buffers and their pattern of usage, then wrap whatever code you need around that in your SpriteBatch.

 

Maybe you are talking about something like this? Where my SpriteBatcher::draw call only fills my arrays with data, instead of also checking for a texture swap. And my render call is the one that uses the draw tracking buffer to check if a texture swap is needed

 

You are in the right track with your sample code.  However, I would stay away from using a fixed size array for queuing up the data.  What you want to be able to do is take an unknown number of SpriteBatch::Draw calls, queue them up, then when you call SpriteBatch::EndBatch, you want to fill up the dynamic buffer as many times as it takes to draw everything in your internal queue.  If your internal size is fixed what happens if Draw is called more times than you have storage for?

 

Have you looked at DirectXTK?  http://directxtk.codeplex.com/

 

It actually has a SpriteBatch in it, which might save you a lot of time trying to create your own.  The source is there for you to study, or adapt into your own creations.  I haven't looked at the source, but I'm pretty sure it follows a similar pattern with dynamic buffers -- based on Shawn's writeup of DISCARD/NO_OVERWRITE dynamic buffers in XNA : http://blogs.msdn.com/b/shawnhar/archive/2010/07/07/setdataoptions-nooverwrite-versus-discard.aspx

 



#11 Myiasis   Members   -  Reputation: 211

Like
1Likes
Like

Posted 30 September 2013 - 01:03 AM

I drew a picture, maybe this will help visualize it.

 

Scenario:

  • Your SpriteBatch has an internal list that keeps track of the draw calls.  This is of unlimited size.
  • You have dynamic vertex buffer that can hold up to 3 draw calls worth of data.

 

Picture starts in upper left, flows down the left side then over the next column top and flows down again.

 

Flow:

  1. 6 Draw calls are made: Blue, Green, Red, Green, Blue Green
  2. The internal list stored the draw calls in the order they came in and looks like the "Internal buffer"
  3. EndBatch is called.
  4. You want to be efficient, so you sort the internal buffer by texture which then looks like "Sort by Texture"
  5. Now that it is sorted, you can walk the internal buffer watching for texture changes.
  6. The dynamic buffer has not yet had any data written to it, it is empty.
  7. You take a lock with the NoOverwrite flag because the buffer is empty.
  8. You are keeping track of where you last inserted into the dynamic buffer. Empty, so you are at the top of the buffer.  The orange arrow is the next insertion point.
  9. You insert the blue quad data, followed by another blue quad, then you encounter a green and that will require a texture change.  So you unlock the buffer and DrawIndexed for the 2 blue quads in the buffer.
  10. Now you want to draw the green quads.  So you take another Lock, NoOverwrite again because there is still room in the dynamic buffer.  You have kept track of the insertion point, orange arrow again.  You insert the data for the green quad, but only the one fits.  You need to unlock the buffer, DrawIndexed for the 1 green quad.
  11. You filled up the buffer, but have more green quads to draw.  So this time you take the lock with the Discard flag which gives you a fresh buffer to write data to.  Empty buffer because you called Discard, so the insertion point (orange arrow) is back at the top.  You insert two more green quads then realize a red quad is next.  So you unlock the buffer, DrawIndexed on the 2 green quads in the dynamic buffer.
  12. Only the red quad is left in your internal list.  Need to take a lock again, using NoOverwrite because there is still room for it in the dynamic buffer.  Orange arrow again is your insertion point.  You put the red quad data in, unlock the buffer, DrawIndexed on the red quad data.

The internal buffer doesn't have to be in the vertex format.  You need to keep track of enough data to tell when a state change is needed (texture change in this case), and enough data to be able to create the vertex data for the quads.

 

 

hYBLbgo.png



#12 noodleBowl   Members   -  Reputation: 186

Like
0Likes
Like

Posted 30 September 2013 - 09:44 PM

 

 

So when I create the vertex and index buffers in the constructor, I want to create them with the max size even though I may or may not ever fill them up completely? I would also assume I would only want to memcopy the data I need then

 

Yes, the goal being to find some balance.  Too small and you have to call DISCARD too often, too big and you are just wasting a bunch of memory.

 

Your example of creating them in the constructor still didn't create them as dynamic buffers though.  Look at the docs for CreateVertexBuffer and at the usage flags:

 

http://msdn.microsoft.com/en-us/library/windows/desktop/bb147263(v=vs.85).aspx#Using_Dynamic_Vertex_and_Index_Buffers

http://msdn.microsoft.com/en-us/library/windows/desktop/bb174364(v=vs.85).aspx

http://msdn.microsoft.com/en-us/library/windows/desktop/bb172625(v=vs.85).aspx

 

Since you know you'll have data changing frequently (per frame), you want to do it in the most efficient way you can.  Creating and destroying static buffers each frame is going to be spendy.  Dynamic buffers are the solution to frequently changing data.  They are designed to support frequent updates.

 

I see, so then when I create my buffers they should really look like this

//Size of the vertex and index buffer; for 200 quads
int vertexSize = 800;
int indexSize = 1200;

//Create the buffers for dynamic use; Places in the constructor
device->CreateVertexBuffer(vertexSize * sizeof(vertex), D3DUSAGE_WRITEONLY | D3DUSAGE_DYNAMIC, CUSTOMFVF, D3DPOOL_MANAGED, &vBuffer, NULL);
device->CreateIndexBuffer(indexSize * sizeof(short), D3DUSAGE_WRITEONLY | D3DUSAGE_DYNAMIC, D3DFMT_INDEX16, D3DPOOL_MANAGED, &iBuffer, NULL);


Maybe you are talking about something like this? Where my SpriteBatcher::draw call only fills my arrays with data, instead of also checking for a texture swap. And my render call is the one that uses the draw tracking buffer to check if a texture swap is needed

 

You are in the right track with your sample code.  However, I would stay away from using a fixed size array for queuing up the data.  What you want to be able to do is take an unknown number of SpriteBatch::Draw calls, queue them up, then when you call SpriteBatch::EndBatch, you want to fill up the dynamic buffer as many times as it takes to draw everything in your internal queue.  If your internal size is fixed what happens if Draw is called more times than you have storage for?

 

Have you looked at DirectXTK?  http://directxtk.codeplex.com/

 

It actually has a SpriteBatch in it, which might save you a lot of time trying to create your own.  The source is there for you to study, or adapt into your own creations.  I haven't looked at the source, but I'm pretty sure it follows a similar pattern with dynamic buffers -- based on Shawn's writeup of DISCARD/NO_OVERWRITE dynamic buffers in XNA : http://blogs.msdn.com/b/shawnhar/archive/2010/07/07/setdataoptions-nooverwrite-versus-discard.aspx

 

I have heard of DirectXTK, but I want to stay away from the premade stuff as I have always used them and never really understood how they work.

 

As for using an unknown size for my data I'm not really sure how to do this. I mean there is always vectors, but I do not know how I can fill this with my vertex data because of the custom vertex struct, also I'm not sure how to fill the buffers using vectors. Right now since my vertex and index buffers are using a fixed size, I check to see if a draw call I make meets my max. If it does then on the next SpriteBatcher::draw call I just call SpriteBatcher::endBatch, render, send everything to the GPU, adn etc. Then continue on with the new draw call.

 

 

I drew a picture, maybe this will help visualize it.

 

This picture is definatly awesome and extreamly helpful, but I'm still alittle shady on steps 8 - 12.

 

Is this all happening in the SpriteBatcher::endBatch call correct? The thing that I don't understand is how I exactly accomplish this. I understand what we are trying to do, but the implimentation is foggy especially when it comes to the state change buffer.

 

I feel like I need to do something like this, which seems kind of wrong:

 

1. Hit the main.cpp render call and begin the batch with the SpriteBatcher::beginBatch call

2. Call SpriteBatcher::draw X amount of times

    - This will store our vertex and index data to make a single quad into an array or vector

    - This will also add the quad to the buffer for state change tracking

3. Hit the SpriteBatcher::endBatch call in the main.cpp render call

    - Sorts the state change buffer for performance

    - Use some kind of for-loop to load as much data as we can into the vertex and index buffers before having to call the device->DrawIndexedPrimitive

             + Load the data into the index and vertex buffers using the proper locking flags based on the sorted state change buffer

             + Check for state changes, if a state is found use the device->DrawIndexedPrimitive for anything in the buffers

             + Check to see if we are at our limit for the buffers hold size, if we are use the device->DrawIndexedPrimitive for anything in the buffers

             + If we were not at our buffer limit, continue to fill the buffer using the NOOVERWRITE lock flag. Otherwise use the DISCARD flag to start fresh

             + Repeat this process until we are done with all data;

4. Start the next frame



#13 Myiasis   Members   -  Reputation: 211

Like
0Likes
Like

Posted 30 September 2013 - 10:55 PM

Yup, your vertex declaration looks better the dynamic flag.  I think I saw that you can't create a dynamic buffer with the managed pool though?  Maybe I'm wrong, I haven't used that API -- something to look into.

 

That actually brings up another thought though, do you have to use the DX9 API?  You can use the DX11 API with a DX9 feature set.  Same concepts, but the functions are a little different to do this.

 

As for using an unknown size for my data I'm not really sure how to do this. I mean there is always vectors, but I do not know how I can fill this with my vertex data because of the custom vertex struct, also I'm not sure how to fill the buffers using vectors.

 

Create your own structure/class to stuff into the vector like you mentioned.  Something like this:

struct DrawData
{
   DrawData(const Rectangle& rect, Texture* texture)
      : m_rect(rect),
      m_texture(texture)
   {
   }

   Texture* m_texture;
   Rectangle m_rect;
};

std::vector<DrawData> m_drawList;

SpriteBatch::BeginBatch()
{
   m_drawList.clear();
}

SpriteBatch::Draw(const Rectangle& rect, Texture* texture)
{
   m_drawList.push_back(DrawData(rect,texture));
}

SpriteBatch::EndBatch()
{
   std::sort(
      m_drawList.begin(),
      m_drawList.end(),
      [](DrawData& first, DrawData& second)
      {
         return first.m_texture < second.m_texture;
      });

   for (auto& item : m_drawList)
   {
      if (item.m_texture != lastTexture || bufferIsFull)
      {
         // Unlock dynamic buffers and draw anything you put in there already.
         // Take lock again with which lock flag you need.
         // reset your lastTexture to item.m_Texture
         // reset your offset into dynamic buffer if you discarded
      }
      // Fill in the vertex data for the 'item'.  You could do this by casting the pointer you got
      // back from the lock into a vertex structure, or fill in a local vertex structure and then
      // copy to the dynamic buffer, whatever you find most friendly to your style.
      // Each time you put more data in the dynamic buffer just keep track of where you were at so
      // next time through the loop you can figure out if the buffer is full or not.
   }
}

You can add whatever data you need to that object.  You don't have to figure out all the final vertex data in the draw call.  You could if you wanted to though, really up to you.  Either you need to track enough data that you can expand it all into the dynamic buffer during EndBatch, or put all that data in the DrawData structure and fill it in during the draw call.

 

Your steps to follow sound about right to me.  What doesn't feel right to you with the workflow?  This is just my take on a SpriteBatch, others probably would do things differently, although I would expect most of them would based around the dynamic buffers in a similar conceptual way.

 



#14 noodleBowl   Members   -  Reputation: 186

Like
0Likes
Like

Posted 02 October 2013 - 09:12 PM

Yup, your vertex declaration looks better the dynamic flag.  I think I saw that you can't create a dynamic buffer with the managed pool though?  Maybe I'm wrong, I haven't used that API -- something to look into.

 

That actually brings up another thought though, do you have to use the DX9 API?  You can use the DX11 API with a DX9 feature set.  Same concepts, but the functions are a little different to do this.

 

I will have to look into that for sure.

 

The main reason I'm using DirectX 9 is because I am afraid of the capability of my applicaiton on older systems, but if I can use DirectX 11 and have it "downgrade" to DirectX 9 then I will deffinitly have to look into that.

 

 

 

As for using an unknown size for my data I'm not really sure how to do this. I mean there is always vectors, but I do not know how I can fill this with my vertex data because of the custom vertex struct, also I'm not sure how to fill the buffers using vectors.

 

Create your own structure/class to stuff into the vector like you mentioned.  Something like this:

struct DrawData
{
   DrawData(const Rectangle& rect, Texture* texture)
      : m_rect(rect),
      m_texture(texture)
   {
   }

   Texture* m_texture;
   Rectangle m_rect;
};

std::vector<DrawData> m_drawList;

SpriteBatch::BeginBatch()
{
   m_drawList.clear();
}

SpriteBatch::Draw(const Rectangle& rect, Texture* texture)
{
   m_drawList.push_back(DrawData(rect,texture));
}

SpriteBatch::EndBatch()
{
   std::sort(
      m_drawList.begin(),
      m_drawList.end(),
      [](DrawData& first, DrawData& second)
      {
         return first.m_texture < second.m_texture;
      });

   for (auto& item : m_drawList)
   {
      if (item.m_texture != lastTexture || bufferIsFull)
      {
         // Unlock dynamic buffers and draw anything you put in there already.
         // Take lock again with which lock flag you need.
         // reset your lastTexture to item.m_Texture
         // reset your offset into dynamic buffer if you discarded
      }
      // Fill in the vertex data for the 'item'.  You could do this by casting the pointer you got
      // back from the lock into a vertex structure, or fill in a local vertex structure and then
      // copy to the dynamic buffer, whatever you find most friendly to your style.
      // Each time you put more data in the dynamic buffer just keep track of where you were at so
      // next time through the loop you can figure out if the buffer is full or not.
   }
}

You can add whatever data you need to that object.  You don't have to figure out all the final vertex data in the draw call.  You could if you wanted to though, really up to you.  Either you need to track enough data that you can expand it all into the dynamic buffer during EndBatch, or put all that data in the DrawData structure and fill it in during the draw call.

 

 

This was extreamly helpful, because now I know how to use vectors properly. Im still working out the semantics for this part of the code but it should be done soon.

 

I am just wondering how we actually fill the buffer especially since we are using a different struct setup? For example if we had these structs

//Vertex struct
struct vertex
{
    float x;
    float y;
    float z;
    float rhw;
    D3DCOLOR color;
    float u;
    float v;
};

//Quad struct
struct quad
{
        short index[6];
	vertex verts[4];
	LPDIRECT3DTEXTURE9 texture;
};

Is it even possible to fill the vertex and index buffers using a vector made out of the quad struct? Or is there going to have to be some finagling involved?

 

Maybe something like this?

//The buffer lock; don't worry about the flags for now
vertex *vertices;
vBuffer->Lock(0, 0, (void**) &vertices, NULL);

//Fill the buffer with data?
for(std::vector<quad>::iterator i = drawData.begin(); i != drawData.end(); i++)
{
	//Vertex 0
	vertices[0].x = (*i).verts[0].x;
	vertices[0].y = (*i).verts[0].y;
	vertices[0].z = (*i).verts[0].z;
	vertices[0].rhw = (*i).verts[0].rhw;
	vertices[0].color = (*i).verts[0].color;
	vertices[0].u = (*i).verts[0].u;
	vertices[0].v = (*i).verts[0].v;

	//Vertex 1
	vertices[1].x = (*i).verts[1].x;
	vertices[1].y = (*i).verts[1].y;
	vertices[1].z = (*i).verts[1].z;
	vertices[1].rhw = (*i).verts[1].rhw;
	vertices[1].color = (*i).verts[1].color;
	vertices[1].u = (*i).verts[1].u;
	vertices[1].v = (*i).verts[1].v;

	//Repeat for vertex 2 and 3

}

//Unlock the buffer
vBuffer->Unlock();

Your steps to follow sound about right to me.  What doesn't feel right to you with the workflow?  This is just my take on a SpriteBatch, others probably would do things differently, although I would expect most of them would based around the dynamic buffers in a similar conceptual way.

 

It just seems a little off to me, to be "stuck" in the endBatch until everything is done. That the only part that gets me.

 

But even still there are alot of great things in terms of performance that are going on.

 

1. The buffers are dynamic and are meant to change per frame

2. We are sorting by textures and in theory we are at most calling the DrawIndexPrimative based on the number of texture swaps needed to happen

3. Everything is sent to the GPU in bulk

 

The only thing I have off the top of my head that could be a issue is the texture depth. Depth in the sense that after sorting textures and drawing them the last texture in the sorted list will always be on top. Is this omething I can expect? And of course what would be a solution to this?


Edited by noodleBowl, 02 October 2013 - 10:16 PM.


#15 Myiasis   Members   -  Reputation: 211

Like
0Likes
Like

Posted 03 October 2013 - 03:41 AM

 

Maybe something like this?

//The buffer lock; don't worry about the flags for now
vertex *vertices;
vBuffer->Lock(0, 0, (void**) &vertices, NULL);

//Fill the buffer with data?
for(std::vector<quad>::iterator i = drawData.begin(); i != drawData.end(); i++)
{
    //Vertex 0
    vertices[0].x = (*i).verts[0].x;
    vertices[0].y = (*i).verts[0].y;
    vertices[0].z = (*i).verts[0].z;
    vertices[0].rhw = (*i).verts[0].rhw;
    vertices[0].color = (*i).verts[0].color;
    vertices[0].u = (*i).verts[0].u;
    vertices[0].v = (*i).verts[0].v;

 

Something like that, yes.  If you are going to store the actual vertex data in your draw list, you can just memcpy a block of it, be a little easier.  Also need to remember that you if are locking the buffer with the no-overwrite option the pointer you get back will be the pointer to the same buffer you filled last time.  So you need to write data into it past the point where you put data last time.

vertex* vertices;
vBuffer->Lock(0, 0, (void**)&vertices, NULL);

for (auto& item : m_drawList)
{
   memcpy(vertices + m_countWritten, item.verts, sizeof(vertex) * 4);
   m_countWritten += 4;
}

When you Discard, set your count variable back to 0 to start at the top.  Use your count to determine when you are going to overflow the buffer, that's when you know you need to Discard again.

 

True that the sorting by texture like that is only going to work if you don't care about the order that they overlap.  This is where the SpriteBatch needs to be tailored to however you find it handy to use.

 

For example, you could add "int m_level" to the DrawData, and add a "level" parameter to the DrawCalls.  This lets the caller says "all these are on level 0, these are one level 1 (need to be drawn on top of level 0)."  Then when you sort, sort by level and texture both so you end up with the levels sorted together, then by texture within the level.

 

Or you could drive that externally:  The code that uses your sprite batch could Begin -> Draw all the stuff that doesn't overlap -> End, then start up a new batch Begin -> Draw overlapping stuff -> End...

 

Or your sprite batch could determine levels automatically based on overlapping rectangles.

 

 



#16 noodleBowl   Members   -  Reputation: 186

Like
0Likes
Like

Posted 04 October 2013 - 10:53 PM

Something like that, yes.  If you are going to store the actual vertex data in your draw list, you can just memcpy a block of it, be a little easier.  Also need to remember that you if are locking the buffer with the no-overwrite option the pointer you get back will be the pointer to the same buffer you filled last time.  So you need to write data into it past the point where you put data last time.
vertex* vertices;
vBuffer->Lock(0, 0, (void**)&vertices, NULL);

for (auto& item : m_drawList)
{
   memcpy(vertices + m_countWritten, item.verts, sizeof(vertex) * 4);
   m_countWritten += 4;
}

 

I have started to implement everything, but I am still a little stuck on the texture swap portion of the code.

 

Currently my code for the SpriteBatcher::endBatch is basing everything off of using one texture and only using the DISCARD flag

 

Here is my current code:

void SpriteBatcher::endBatch()
{
	//std::cout<<"RENDER"<<std::endl;

	//Lock the buffer based on the bufferLockFlag; Set to DISCARD for now
	vBuffer->Lock(0, 0, (void**) &vertices, bufferLockFlag);
	iBuffer->Lock(0, 0, (void**) &indices, bufferLockFlag);

	//Loop through all of our quads and get there data
	for(std::vector<quad>::iterator i = drawData.begin(); i != drawData.end(); i++)
	{
		//Check for texture change
		if(currentTexture != (*i).texture)
		{
			std::cout<<"Set texture: "<<(*i).texture<<std::endl;
			currentTexture = (*i).texture;
		}

		//Copy the verts into the buffer
		memcpy(vertices + vertCount, (*i).verts, sizeof((*i).verts));

		//Get all the indices
		indices[indexCount] = currentIndex;
		indices[indexCount + 1] = currentIndex + 1;
		indices[indexCount + 2] = currentIndex + 2;
		indices[indexCount + 3] = currentIndex + 3;
		indices[indexCount + 4] = currentIndex;
		indices[indexCount + 5] = currentIndex + 2;

		//Increase the counts
		indexCount += 6;
		currentIndex += 4;
		vertCount += 4;
		numShapes += 2;
	}

	//Unlock the buffers
	vBuffer->Unlock();
	iBuffer->Unlock();

	//Set the texture and draw everything; The Stream Source and Index buffer are set in an init method
	batDevice->SetTexture(0, currentTexture);
	batDevice->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, vertCount, 0, numShapes);

	//Clear the vector and reset all the counts
	drawData.clear();
	vertCount = 0;
	indexCount = 0;
	currentIndex = 0;
	numShapes = 0;
}

This is currently working great, but I'm a little confused about the texture swap check portion. I understand I should unlock the buffers, set the texture to use / draw everything, and then reclaim the lock with the right flag, but my confusion comes from the drawing part.

 

When I draw everything inside of the texture check, do I need to reset my counts since I drew everything? Does calling the batDevice->DrawIndexedPrimitive method clear out the buffer? Meaning that I have to set the vertCount and numShapes to the right values in the final batDevice->DrawIndexedPrimitive call (the one outside the check texture swap area)?

 

EG:

void vBatcher::endBatch()
{
	//std::cout<<"RENDER"<<std::endl;

	//Lock the buffer based on the bufferLockFlag; Set to DISCARD for now
	vBuffer->Lock(0, 0, (void**) &vertices, bufferLockFlag);
	iBuffer->Lock(0, 0, (void**) &indices, bufferLockFlag);

	//Loop through all of our quads and get there data
	for(std::vector<quad>::iterator i = drawData.begin(); i != drawData.end(); i++)
	{
		//Check for texture change
		if(currentTexture != (*i).texture)
		{
			//Unlock the buffers because we need to swap textures
			vBuffer->Unlock();
			iBuffer->Unlock();

                        //Confusion Part
			//Set the texture and draw everything; The Stream Source and Index buffer are set in an init method;
			//Draw everything because of the texture swap
			batDevice->SetTexture(0, currentTexture);
			batDevice->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, vertCount, 0, numShapes);

			//Reset the counts on everything because we did a texture swap
			vertCount = 0;
			indexCount = 0;
			currentIndex = 0;
			numShapes = 0;

			//Set the new texture
			std::cout<<"Set texture: "<<(*i).texture<<std::endl;
			currentTexture = (*i).texture;

			//{!} This is where the flag check would go to determine which flag to use
		        //Not sure how to handle this part because of confusion

			//Relock the buffer based on the bufferLockFlag;
			vBuffer->Lock(0, 0, (void**) &vertices, bufferLockFlag);
			iBuffer->Lock(0, 0, (void**) &indices, bufferLockFlag);
		}

		//Cop the verts into the buffer
		memcpy(vertices + vertCount, (*i).verts, sizeof((*i).verts));

		//Get all the indices
		indices[indexCount] = currentIndex;
		indices[indexCount + 1] = currentIndex + 1;
		indices[indexCount + 2] = currentIndex + 2;
		indices[indexCount + 3] = currentIndex + 3;
		indices[indexCount + 4] = currentIndex;
		indices[indexCount + 5] = currentIndex + 2;

		//Increase the counts
		indexCount += 6;
		currentIndex += 4;
		vertCount += 4;
		numShapes += 2;
	}

	//Unlock the buffers
	vBuffer->Unlock();
	iBuffer->Unlock();

	//The final draw call in the endBatch
        //Set the texture and draw everything; The Stream Source and Index buffer are set in an init method
	batDevice->SetTexture(0, currentTexture);
	batDevice->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, vertCount, 0, numShapes);

	//Clear the vector and reset all the counts
	drawData.clear();
	vertCount = 0;
	indexCount = 0;
	currentIndex = 0;
	numShapes = 0;
}

If this is correct then I think I would only need to create a new variable to determine how close I am to filling the buffer. And then I would use that to base what lock I need to grab in textureswap. Correct?



#17 Myiasis   Members   -  Reputation: 211

Like
0Likes
Like

Posted 05 October 2013 - 07:57 AM

 

 

When I draw everything inside of the texture check, do I need to reset my counts since I drew everything? Does calling the batDevice->DrawIndexedPrimitive method clear out the buffer? Meaning that I have to set the vertCount and numShapes to the right values in the final batDevice->DrawIndexedPrimitive call (the one outside the check texture swap area)?

 

No, you only want to reset your counts when you DISCARD on the buffer.  Drawing doesn't do anything to the buffer at all.

 

The DrawIndexedPrimitive function takes an offset into the buffer where it should start drawing.   If you put in 12 vertices the first time, when you take the lock with No_Overwrite on a second texture you will start putting vertices into the buffer starting at offset 12.  When you call DrawIndexedPrimitive the second parameter is which vertex to start drawing from (12 in this case).

 

When you stack the draw calls like that, using No_Overwrite, the GPU is probably still working on the buffer even as your are filling it with more data.  You have to make sure you don't overwrite the data you already put in there because you don't know whether the gpu is in the middle of working on it.  If you were to reset your counts when using No_Overwrite, you would in fact be overwriting.

 

 



#18 noodleBowl   Members   -  Reputation: 186

Like
0Likes
Like

Posted 06 October 2013 - 12:32 PM

 

No, you only want to reset your counts when you DISCARD on the buffer.  Drawing doesn't do anything to the buffer at all.

 

The DrawIndexedPrimitive function takes an offset into the buffer where it should start drawing.   If you put in 12 vertices the first time, when you take the lock with No_Overwrite on a second texture you will start putting vertices into the buffer starting at offset 12.  When you call DrawIndexedPrimitive the second parameter is which vertex to start drawing from (12 in this case).

 

When you stack the draw calls like that, using No_Overwrite, the GPU is probably still working on the buffer even as your are filling it with more data.  You have to make sure you don't overwrite the data you already put in there because you don't know whether the gpu is in the middle of working on it.  If you were to reset your counts when using No_Overwrite, you would in fact be overwriting.

 

 

I believe I have finally implemented the dynamic buffers. Everything seems to be working, texture swapping and all.

Here is my code, all you need is 2 png images to use as textures named "img" and "img2".

 

It's all ready to run, let me know if you see anything out of the ordinary.

 

main.cpp - http://pastebin.com/qrYP1urN

vBatcher.cpp - http://pastebin.com/1J8dwFS3

vBatcher.h - http://pastebin.com/88D1DrWZ


Edited by noodleBowl, 06 October 2013 - 12:36 PM.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS