Odd rendering issue; need help

Started by
16 comments, last by noodleBowl 10 years, 6 months ago

I drew a picture, maybe this will help visualize it.

Scenario:

  • Your SpriteBatch has an internal list that keeps track of the draw calls. This is of unlimited size.
  • You have dynamic vertex buffer that can hold up to 3 draw calls worth of data.

Picture starts in upper left, flows down the left side then over the next column top and flows down again.

Flow:

  1. 6 Draw calls are made: Blue, Green, Red, Green, Blue Green
  2. The internal list stored the draw calls in the order they came in and looks like the "Internal buffer"
  3. EndBatch is called.
  4. You want to be efficient, so you sort the internal buffer by texture which then looks like "Sort by Texture"
  5. Now that it is sorted, you can walk the internal buffer watching for texture changes.
  6. The dynamic buffer has not yet had any data written to it, it is empty.
  7. You take a lock with the NoOverwrite flag because the buffer is empty.
  8. You are keeping track of where you last inserted into the dynamic buffer. Empty, so you are at the top of the buffer. The orange arrow is the next insertion point.
  9. You insert the blue quad data, followed by another blue quad, then you encounter a green and that will require a texture change. So you unlock the buffer and DrawIndexed for the 2 blue quads in the buffer.
  10. Now you want to draw the green quads. So you take another Lock, NoOverwrite again because there is still room in the dynamic buffer. You have kept track of the insertion point, orange arrow again. You insert the data for the green quad, but only the one fits. You need to unlock the buffer, DrawIndexed for the 1 green quad.
  11. You filled up the buffer, but have more green quads to draw. So this time you take the lock with the Discard flag which gives you a fresh buffer to write data to. Empty buffer because you called Discard, so the insertion point (orange arrow) is back at the top. You insert two more green quads then realize a red quad is next. So you unlock the buffer, DrawIndexed on the 2 green quads in the dynamic buffer.
  12. Only the red quad is left in your internal list. Need to take a lock again, using NoOverwrite because there is still room for it in the dynamic buffer. Orange arrow again is your insertion point. You put the red quad data in, unlock the buffer, DrawIndexed on the red quad data.

The internal buffer doesn't have to be in the vertex format. You need to keep track of enough data to tell when a state change is needed (texture change in this case), and enough data to be able to create the vertex data for the quads.

hYBLbgo.png

Advertisement

So when I create the vertex and index buffers in the constructor, I want to create them with the max size even though I may or may not ever fill them up completely? I would also assume I would only want to memcopy the data I need then

Yes, the goal being to find some balance. Too small and you have to call DISCARD too often, too big and you are just wasting a bunch of memory.

Your example of creating them in the constructor still didn't create them as dynamic buffers though. Look at the docs for CreateVertexBuffer and at the usage flags:

http://msdn.microsoft.com/en-us/library/windows/desktop/bb147263(v=vs.85).aspx#Using_Dynamic_Vertex_and_Index_Buffers

http://msdn.microsoft.com/en-us/library/windows/desktop/bb174364(v=vs.85).aspx

http://msdn.microsoft.com/en-us/library/windows/desktop/bb172625(v=vs.85).aspx

Since you know you'll have data changing frequently (per frame), you want to do it in the most efficient way you can. Creating and destroying static buffers each frame is going to be spendy. Dynamic buffers are the solution to frequently changing data. They are designed to support frequent updates.

I see, so then when I create my buffers they should really look like this


//Size of the vertex and index buffer; for 200 quads
int vertexSize = 800;
int indexSize = 1200;

//Create the buffers for dynamic use; Places in the constructor
device->CreateVertexBuffer(vertexSize * sizeof(vertex), D3DUSAGE_WRITEONLY | D3DUSAGE_DYNAMIC, CUSTOMFVF, D3DPOOL_MANAGED, &vBuffer, NULL);
device->CreateIndexBuffer(indexSize * sizeof(short), D3DUSAGE_WRITEONLY | D3DUSAGE_DYNAMIC, D3DFMT_INDEX16, D3DPOOL_MANAGED, &iBuffer, NULL);


Maybe you are talking about something like this? Where my SpriteBatcher::draw call only fills my arrays with data, instead of also checking for a texture swap. And my render call is the one that uses the draw tracking buffer to check if a texture swap is needed

You are in the right track with your sample code. However, I would stay away from using a fixed size array for queuing up the data. What you want to be able to do is take an unknown number of SpriteBatch::Draw calls, queue them up, then when you call SpriteBatch::EndBatch, you want to fill up the dynamic buffer as many times as it takes to draw everything in your internal queue. If your internal size is fixed what happens if Draw is called more times than you have storage for?

Have you looked at DirectXTK? http://directxtk.codeplex.com/

It actually has a SpriteBatch in it, which might save you a lot of time trying to create your own. The source is there for you to study, or adapt into your own creations. I haven't looked at the source, but I'm pretty sure it follows a similar pattern with dynamic buffers -- based on Shawn's writeup of DISCARD/NO_OVERWRITE dynamic buffers in XNA : http://blogs.msdn.com/b/shawnhar/archive/2010/07/07/setdataoptions-nooverwrite-versus-discard.aspx

I have heard of DirectXTK, but I want to stay away from the premade stuff as I have always used them and never really understood how they work.

As for using an unknown size for my data I'm not really sure how to do this. I mean there is always vectors, but I do not know how I can fill this with my vertex data because of the custom vertex struct, also I'm not sure how to fill the buffers using vectors. Right now since my vertex and index buffers are using a fixed size, I check to see if a draw call I make meets my max. If it does then on the next SpriteBatcher::draw call I just call SpriteBatcher::endBatch, render, send everything to the GPU, adn etc. Then continue on with the new draw call.

I drew a picture, maybe this will help visualize it.

This picture is definatly awesome and extreamly helpful, but I'm still alittle shady on steps 8 - 12.

Is this all happening in the SpriteBatcher::endBatch call correct? The thing that I don't understand is how I exactly accomplish this. I understand what we are trying to do, but the implimentation is foggy especially when it comes to the state change buffer.

I feel like I need to do something like this, which seems kind of wrong:

1. Hit the main.cpp render call and begin the batch with the SpriteBatcher::beginBatch call

2. Call SpriteBatcher::draw X amount of times

- This will store our vertex and index data to make a single quad into an array or vector

- This will also add the quad to the buffer for state change tracking

3. Hit the SpriteBatcher::endBatch call in the main.cpp render call

- Sorts the state change buffer for performance

- Use some kind of for-loop to load as much data as we can into the vertex and index buffers before having to call the device->DrawIndexedPrimitive

+ Load the data into the index and vertex buffers using the proper locking flags based on the sorted state change buffer

+ Check for state changes, if a state is found use the device->DrawIndexedPrimitive for anything in the buffers

+ Check to see if we are at our limit for the buffers hold size, if we are use the device->DrawIndexedPrimitive for anything in the buffers

+ If we were not at our buffer limit, continue to fill the buffer using the NOOVERWRITE lock flag. Otherwise use the DISCARD flag to start fresh

+ Repeat this process until we are done with all data;

4. Start the next frame

Yup, your vertex declaration looks better the dynamic flag. I think I saw that you can't create a dynamic buffer with the managed pool though? Maybe I'm wrong, I haven't used that API -- something to look into.

That actually brings up another thought though, do you have to use the DX9 API? You can use the DX11 API with a DX9 feature set. Same concepts, but the functions are a little different to do this.

As for using an unknown size for my data I'm not really sure how to do this. I mean there is always vectors, but I do not know how I can fill this with my vertex data because of the custom vertex struct, also I'm not sure how to fill the buffers using vectors.

Create your own structure/class to stuff into the vector like you mentioned. Something like this:


struct DrawData
{
   DrawData(const Rectangle& rect, Texture* texture)
      : m_rect(rect),
      m_texture(texture)
   {
   }

   Texture* m_texture;
   Rectangle m_rect;
};

std::vector<DrawData> m_drawList;

SpriteBatch::BeginBatch()
{
   m_drawList.clear();
}

SpriteBatch::Draw(const Rectangle& rect, Texture* texture)
{
   m_drawList.push_back(DrawData(rect,texture));
}

SpriteBatch::EndBatch()
{
   std::sort(
      m_drawList.begin(),
      m_drawList.end(),
      [](DrawData& first, DrawData& second)
      {
         return first.m_texture < second.m_texture;
      });

   for (auto& item : m_drawList)
   {
      if (item.m_texture != lastTexture || bufferIsFull)
      {
         // Unlock dynamic buffers and draw anything you put in there already.
         // Take lock again with which lock flag you need.
         // reset your lastTexture to item.m_Texture
         // reset your offset into dynamic buffer if you discarded
      }
      // Fill in the vertex data for the 'item'.  You could do this by casting the pointer you got
      // back from the lock into a vertex structure, or fill in a local vertex structure and then
      // copy to the dynamic buffer, whatever you find most friendly to your style.
      // Each time you put more data in the dynamic buffer just keep track of where you were at so
      // next time through the loop you can figure out if the buffer is full or not.
   }
}

You can add whatever data you need to that object. You don't have to figure out all the final vertex data in the draw call. You could if you wanted to though, really up to you. Either you need to track enough data that you can expand it all into the dynamic buffer during EndBatch, or put all that data in the DrawData structure and fill it in during the draw call.

Your steps to follow sound about right to me. What doesn't feel right to you with the workflow? This is just my take on a SpriteBatch, others probably would do things differently, although I would expect most of them would based around the dynamic buffers in a similar conceptual way.

Yup, your vertex declaration looks better the dynamic flag. I think I saw that you can't create a dynamic buffer with the managed pool though? Maybe I'm wrong, I haven't used that API -- something to look into.

That actually brings up another thought though, do you have to use the DX9 API? You can use the DX11 API with a DX9 feature set. Same concepts, but the functions are a little different to do this.

I will have to look into that for sure.

The main reason I'm using DirectX 9 is because I am afraid of the capability of my applicaiton on older systems, but if I can use DirectX 11 and have it "downgrade" to DirectX 9 then I will deffinitly have to look into that.

As for using an unknown size for my data I'm not really sure how to do this. I mean there is always vectors, but I do not know how I can fill this with my vertex data because of the custom vertex struct, also I'm not sure how to fill the buffers using vectors.

Create your own structure/class to stuff into the vector like you mentioned. Something like this:


struct DrawData
{
   DrawData(const Rectangle& rect, Texture* texture)
      : m_rect(rect),
      m_texture(texture)
   {
   }

   Texture* m_texture;
   Rectangle m_rect;
};

std::vector<DrawData> m_drawList;

SpriteBatch::BeginBatch()
{
   m_drawList.clear();
}

SpriteBatch::Draw(const Rectangle& rect, Texture* texture)
{
   m_drawList.push_back(DrawData(rect,texture));
}

SpriteBatch::EndBatch()
{
   std::sort(
      m_drawList.begin(),
      m_drawList.end(),
      [](DrawData& first, DrawData& second)
      {
         return first.m_texture < second.m_texture;
      });

   for (auto& item : m_drawList)
   {
      if (item.m_texture != lastTexture || bufferIsFull)
      {
         // Unlock dynamic buffers and draw anything you put in there already.
         // Take lock again with which lock flag you need.
         // reset your lastTexture to item.m_Texture
         // reset your offset into dynamic buffer if you discarded
      }
      // Fill in the vertex data for the 'item'.  You could do this by casting the pointer you got
      // back from the lock into a vertex structure, or fill in a local vertex structure and then
      // copy to the dynamic buffer, whatever you find most friendly to your style.
      // Each time you put more data in the dynamic buffer just keep track of where you were at so
      // next time through the loop you can figure out if the buffer is full or not.
   }
}

You can add whatever data you need to that object. You don't have to figure out all the final vertex data in the draw call. You could if you wanted to though, really up to you. Either you need to track enough data that you can expand it all into the dynamic buffer during EndBatch, or put all that data in the DrawData structure and fill it in during the draw call.

This was extreamly helpful, because now I know how to use vectors properly. Im still working out the semantics for this part of the code but it should be done soon.

I am just wondering how we actually fill the buffer especially since we are using a different struct setup? For example if we had these structs


//Vertex struct
struct vertex
{
    float x;
    float y;
    float z;
    float rhw;
    D3DCOLOR color;
    float u;
    float v;
};

//Quad struct
struct quad
{
        short index[6];
	vertex verts[4];
	LPDIRECT3DTEXTURE9 texture;
};

Is it even possible to fill the vertex and index buffers using a vector made out of the quad struct? Or is there going to have to be some finagling involved?

Maybe something like this?


//The buffer lock; don't worry about the flags for now
vertex *vertices;
vBuffer->Lock(0, 0, (void**) &vertices, NULL);

//Fill the buffer with data?
for(std::vector<quad>::iterator i = drawData.begin(); i != drawData.end(); i++)
{
	//Vertex 0
	vertices[0].x = (*i).verts[0].x;
	vertices[0].y = (*i).verts[0].y;
	vertices[0].z = (*i).verts[0].z;
	vertices[0].rhw = (*i).verts[0].rhw;
	vertices[0].color = (*i).verts[0].color;
	vertices[0].u = (*i).verts[0].u;
	vertices[0].v = (*i).verts[0].v;

	//Vertex 1
	vertices[1].x = (*i).verts[1].x;
	vertices[1].y = (*i).verts[1].y;
	vertices[1].z = (*i).verts[1].z;
	vertices[1].rhw = (*i).verts[1].rhw;
	vertices[1].color = (*i).verts[1].color;
	vertices[1].u = (*i).verts[1].u;
	vertices[1].v = (*i).verts[1].v;

	//Repeat for vertex 2 and 3

}

//Unlock the buffer
vBuffer->Unlock();

Your steps to follow sound about right to me. What doesn't feel right to you with the workflow? This is just my take on a SpriteBatch, others probably would do things differently, although I would expect most of them would based around the dynamic buffers in a similar conceptual way.

It just seems a little off to me, to be "stuck" in the endBatch until everything is done. That the only part that gets me.

But even still there are alot of great things in terms of performance that are going on.

1. The buffers are dynamic and are meant to change per frame

2. We are sorting by textures and in theory we are at most calling the DrawIndexPrimative based on the number of texture swaps needed to happen

3. Everything is sent to the GPU in bulk

The only thing I have off the top of my head that could be a issue is the texture depth. Depth in the sense that after sorting textures and drawing them the last texture in the sorted list will always be on top. Is this omething I can expect? And of course what would be a solution to this?

Maybe something like this?


//The buffer lock; don't worry about the flags for now
vertex *vertices;
vBuffer->Lock(0, 0, (void**) &vertices, NULL);

//Fill the buffer with data?
for(std::vector<quad>::iterator i = drawData.begin(); i != drawData.end(); i++)
{
    //Vertex 0
    vertices[0].x = (*i).verts[0].x;
    vertices[0].y = (*i).verts[0].y;
    vertices[0].z = (*i).verts[0].z;
    vertices[0].rhw = (*i).verts[0].rhw;
    vertices[0].color = (*i).verts[0].color;
    vertices[0].u = (*i).verts[0].u;
    vertices[0].v = (*i).verts[0].v;

Something like that, yes. If you are going to store the actual vertex data in your draw list, you can just memcpy a block of it, be a little easier. Also need to remember that you if are locking the buffer with the no-overwrite option the pointer you get back will be the pointer to the same buffer you filled last time. So you need to write data into it past the point where you put data last time.


vertex* vertices;
vBuffer->Lock(0, 0, (void**)&vertices, NULL);

for (auto& item : m_drawList)
{
   memcpy(vertices + m_countWritten, item.verts, sizeof(vertex) * 4);
   m_countWritten += 4;
}

When you Discard, set your count variable back to 0 to start at the top. Use your count to determine when you are going to overflow the buffer, that's when you know you need to Discard again.

True that the sorting by texture like that is only going to work if you don't care about the order that they overlap. This is where the SpriteBatch needs to be tailored to however you find it handy to use.

For example, you could add "int m_level" to the DrawData, and add a "level" parameter to the DrawCalls. This lets the caller says "all these are on level 0, these are one level 1 (need to be drawn on top of level 0)." Then when you sort, sort by level and texture both so you end up with the levels sorted together, then by texture within the level.

Or you could drive that externally: The code that uses your sprite batch could Begin -> Draw all the stuff that doesn't overlap -> End, then start up a new batch Begin -> Draw overlapping stuff -> End...

Or your sprite batch could determine levels automatically based on overlapping rectangles.

Something like that, yes. If you are going to store the actual vertex data in your draw list, you can just memcpy a block of it, be a little easier. Also need to remember that you if are locking the buffer with the no-overwrite option the pointer you get back will be the pointer to the same buffer you filled last time. So you need to write data into it past the point where you put data last time.

vertex* vertices;
vBuffer->Lock(0, 0, (void**)&vertices, NULL);

for (auto& item : m_drawList)
{
   memcpy(vertices + m_countWritten, item.verts, sizeof(vertex) * 4);
   m_countWritten += 4;
}

I have started to implement everything, but I am still a little stuck on the texture swap portion of the code.

Currently my code for the SpriteBatcher::endBatch is basing everything off of using one texture and only using the DISCARD flag

Here is my current code:


void SpriteBatcher::endBatch()
{
	//std::cout<<"RENDER"<<std::endl;

	//Lock the buffer based on the bufferLockFlag; Set to DISCARD for now
	vBuffer->Lock(0, 0, (void**) &vertices, bufferLockFlag);
	iBuffer->Lock(0, 0, (void**) &indices, bufferLockFlag);

	//Loop through all of our quads and get there data
	for(std::vector<quad>::iterator i = drawData.begin(); i != drawData.end(); i++)
	{
		//Check for texture change
		if(currentTexture != (*i).texture)
		{
			std::cout<<"Set texture: "<<(*i).texture<<std::endl;
			currentTexture = (*i).texture;
		}

		//Copy the verts into the buffer
		memcpy(vertices + vertCount, (*i).verts, sizeof((*i).verts));

		//Get all the indices
		indices[indexCount] = currentIndex;
		indices[indexCount + 1] = currentIndex + 1;
		indices[indexCount + 2] = currentIndex + 2;
		indices[indexCount + 3] = currentIndex + 3;
		indices[indexCount + 4] = currentIndex;
		indices[indexCount + 5] = currentIndex + 2;

		//Increase the counts
		indexCount += 6;
		currentIndex += 4;
		vertCount += 4;
		numShapes += 2;
	}

	//Unlock the buffers
	vBuffer->Unlock();
	iBuffer->Unlock();

	//Set the texture and draw everything; The Stream Source and Index buffer are set in an init method
	batDevice->SetTexture(0, currentTexture);
	batDevice->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, vertCount, 0, numShapes);

	//Clear the vector and reset all the counts
	drawData.clear();
	vertCount = 0;
	indexCount = 0;
	currentIndex = 0;
	numShapes = 0;
}

This is currently working great, but I'm a little confused about the texture swap check portion. I understand I should unlock the buffers, set the texture to use / draw everything, and then reclaim the lock with the right flag, but my confusion comes from the drawing part.

When I draw everything inside of the texture check, do I need to reset my counts since I drew everything? Does calling the batDevice->DrawIndexedPrimitive method clear out the buffer? Meaning that I have to set the vertCount and numShapes to the right values in the final batDevice->DrawIndexedPrimitive call (the one outside the check texture swap area)?

EG:


void vBatcher::endBatch()
{
	//std::cout<<"RENDER"<<std::endl;

	//Lock the buffer based on the bufferLockFlag; Set to DISCARD for now
	vBuffer->Lock(0, 0, (void**) &vertices, bufferLockFlag);
	iBuffer->Lock(0, 0, (void**) &indices, bufferLockFlag);

	//Loop through all of our quads and get there data
	for(std::vector<quad>::iterator i = drawData.begin(); i != drawData.end(); i++)
	{
		//Check for texture change
		if(currentTexture != (*i).texture)
		{
			//Unlock the buffers because we need to swap textures
			vBuffer->Unlock();
			iBuffer->Unlock();

                        //Confusion Part
			//Set the texture and draw everything; The Stream Source and Index buffer are set in an init method;
			//Draw everything because of the texture swap
			batDevice->SetTexture(0, currentTexture);
			batDevice->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, vertCount, 0, numShapes);

			//Reset the counts on everything because we did a texture swap
			vertCount = 0;
			indexCount = 0;
			currentIndex = 0;
			numShapes = 0;

			//Set the new texture
			std::cout<<"Set texture: "<<(*i).texture<<std::endl;
			currentTexture = (*i).texture;

			//{!} This is where the flag check would go to determine which flag to use
		        //Not sure how to handle this part because of confusion

			//Relock the buffer based on the bufferLockFlag;
			vBuffer->Lock(0, 0, (void**) &vertices, bufferLockFlag);
			iBuffer->Lock(0, 0, (void**) &indices, bufferLockFlag);
		}

		//Cop the verts into the buffer
		memcpy(vertices + vertCount, (*i).verts, sizeof((*i).verts));

		//Get all the indices
		indices[indexCount] = currentIndex;
		indices[indexCount + 1] = currentIndex + 1;
		indices[indexCount + 2] = currentIndex + 2;
		indices[indexCount + 3] = currentIndex + 3;
		indices[indexCount + 4] = currentIndex;
		indices[indexCount + 5] = currentIndex + 2;

		//Increase the counts
		indexCount += 6;
		currentIndex += 4;
		vertCount += 4;
		numShapes += 2;
	}

	//Unlock the buffers
	vBuffer->Unlock();
	iBuffer->Unlock();

	//The final draw call in the endBatch
        //Set the texture and draw everything; The Stream Source and Index buffer are set in an init method
	batDevice->SetTexture(0, currentTexture);
	batDevice->DrawIndexedPrimitive(D3DPT_TRIANGLELIST, 0, 0, vertCount, 0, numShapes);

	//Clear the vector and reset all the counts
	drawData.clear();
	vertCount = 0;
	indexCount = 0;
	currentIndex = 0;
	numShapes = 0;
}

If this is correct then I think I would only need to create a new variable to determine how close I am to filling the buffer. And then I would use that to base what lock I need to grab in textureswap. Correct?

When I draw everything inside of the texture check, do I need to reset my counts since I drew everything? Does calling the batDevice->DrawIndexedPrimitive method clear out the buffer? Meaning that I have to set the vertCount and numShapes to the right values in the final batDevice->DrawIndexedPrimitive call (the one outside the check texture swap area)?

No, you only want to reset your counts when you DISCARD on the buffer. Drawing doesn't do anything to the buffer at all.

The DrawIndexedPrimitive function takes an offset into the buffer where it should start drawing. If you put in 12 vertices the first time, when you take the lock with No_Overwrite on a second texture you will start putting vertices into the buffer starting at offset 12. When you call DrawIndexedPrimitive the second parameter is which vertex to start drawing from (12 in this case).

When you stack the draw calls like that, using No_Overwrite, the GPU is probably still working on the buffer even as your are filling it with more data. You have to make sure you don't overwrite the data you already put in there because you don't know whether the gpu is in the middle of working on it. If you were to reset your counts when using No_Overwrite, you would in fact be overwriting.

No, you only want to reset your counts when you DISCARD on the buffer. Drawing doesn't do anything to the buffer at all.

The DrawIndexedPrimitive function takes an offset into the buffer where it should start drawing. If you put in 12 vertices the first time, when you take the lock with No_Overwrite on a second texture you will start putting vertices into the buffer starting at offset 12. When you call DrawIndexedPrimitive the second parameter is which vertex to start drawing from (12 in this case).

When you stack the draw calls like that, using No_Overwrite, the GPU is probably still working on the buffer even as your are filling it with more data. You have to make sure you don't overwrite the data you already put in there because you don't know whether the gpu is in the middle of working on it. If you were to reset your counts when using No_Overwrite, you would in fact be overwriting.

I believe I have finally implemented the dynamic buffers. Everything seems to be working, texture swapping and all.

Here is my code, all you need is 2 png images to use as textures named "img" and "img2".

It's all ready to run, let me know if you see anything out of the ordinary.

main.cpp - http://pastebin.com/qrYP1urN

vBatcher.cpp - http://pastebin.com/1J8dwFS3

vBatcher.h - http://pastebin.com/88D1DrWZ

This topic is closed to new replies.

Advertisement