• Advertisement


  • Content count

  • Joined

  • Last visited

Community Reputation

718 Good

1 Follower

About noodleBowl

  • Rank

Personal Information

  • Interests
  1. I don't have a lot of experience when it comes to the depth buffer, but I have been messing around with it and some sprites. Some of which have transparency and I have noticed that when the depth buffer is enabled I get some "weird" results When the depth buffer is on (the orange sprite has a Z-index of -2 and red star sprite has a Z-index of -1). I get this weird "cut out" of where I expect things to be transparent, but really its just the clear color When the depth buffer is off (since the depth buffer is off the Z-index does not matter, so I reordered the way sprites are drawn). The sprites are drawn as I expected So I'm guessing this is where the whole "sort by depth" comes in that I have heard about? Which makes me wonder is there a point to enabling the depth buffer at all when rendering sprites since wouldn't sorting by depth simulate what the depth buffer does? Is the depth buffer more meant for 3D objects?
  2. I'm not sure if this is the problem. I might still be doing it wrong, but I went back and attempted to use the result, but I'm still get 0ms Can you explain what a wavefront, warp, and bank conflict is? I never really heard of these terms before From what I understand a bank conflict is where I'm trying to reuse the same memory bank that I'm already working with (this is what I gather from with the link you provided). As for a wavefront, I believe this is a grouping of threads that are executed on the gpu. I'm not sure what a warp is. I'm also not sure what the significance of a wavefront is Honestly, no not really. I was really using the matrices cause I wanted to be able scale and rotate my sprites along with the normal position transforms, BUT I can definitely do all of those things without a matrix. I went back and reworked it to not use matrices at all. Using the same testing conditions I'm maxing out at ~175K 64x64 sprites when running in release mode, also when I say maxing out I mean that I'm maintaining right at/just above 60 fps. The time it takes for my SpriteRenderer::render() method is ~9.5ms on average in these conditions. This is average is based on the time it takes for 1000 SpriteRenderer::render() executions. I'm not really sure how I feel about this, because a part of me says "good way better!" and the other part says "it could better...". But I feel like in order to get it to be better I need to start exploring different methods such as the ones described by @Matias Goldberg and others. Also looking into using multithreading, since I'm definitely only using a single thread
  3. I definitely agree this is the problem. Its kind of been called out a few times which is why there is talk about DirectXMath and SIMD instructions. Which is awesome because I like hearing anything anyone has to offer. I am trying to learn / get better and why I like that @Mekamani brought up instancing since its a change in my approach Stupid question, not at all! So the reason I'm doing all of this CPU side is because I didn't want to issue a draw call per sprite since each sprite has its own model matrix. This also includes the need to update the constant buffer for the sprite's model matrix, mapping the vertex buffer with the sprite's data, and etc I figured it would be better performance wise to pretransform the vertices, batch them up, and then do a single draw call per batch which is why that code snippet above exists. But as you can see by this thread it might not be the best idea in the world or at least one that was executed poorly on my part You do bring up a good point with the whole instancing thing. I haven't tried to using instancing before, but with the little I understand this is something I should explore. I'm just worried that because my sprites are dynamic instancing is something I can't use, but like I said before my knowledge about it is very very limited
  4. I am definitely simulating what my renderer will do, but why should I not profile in debug mode? I understand that in release mode the compiler will apply optimizations (speaking about the default ones applied) and timings in debug mode are inflated because of various debug checks. But I figure that if I get a low time in debug mode then my release mode timing will definitely be better and overall the application runs better. Might be flawed logic Currently I'm using the Query Performance Counter to do my timing, but it looks like that my start time and end time in release mode come out to be the same so thus I get 0ms. This is even if I try to use INT_MAX as the limit on my for-loop. I am not sure if this means I am doing something wrong or if it is really just that fast. How can I get microsecond / nanosecond precision? Code I'm using I really don't understand what you mean here. Are you saying that I don't use __m128 (XMVECTOR) in my classes when I'm creating an application and I should use the XMFLOATn variables OR are you saying that my classes (the ones provided in my example code) do not use the __m128 (XMVECTOR) type as class members? Anyway, I definitely getting a faster time because of the way I am doing my method. I am passing a ref to the where result should be stored, where as DirectXMath is sending back a copy. If I were to also send back a copy they come out being the same in terms of timing I am using matrices in column major order. As I understand directx traditionally has row major order and you are supposed to transpose them, because this is what is expected when it comes to shaders (they expect column major order). Now when it comes to DirectXMath I'm not sure if this is just an extension of that idea or what It kind of makes me think that I'm doing everything wrong or that it is wrong when I need to transpose my matrices in order to get the correct result. Now, if I only had to transpose them just to pass it along to a shader then I would feel more comfortable
  5. Alright, so I had some time to investigate the DirectX Math lib and run some tests. And I got some questions So here are my results for my tests //============ DEBUG MODE Times ========== //Using normal math operations Norm TIME: 4.275861ms //Using DirectX Math where items were loaded from XMFloat3/XMMatrix4x4 and then stored to XMFloat3 DirectX Math XMFLOAT TIME: 4.965582ms //Using DirectX Math where XMVector/XMMatrix were used directly DirectX Math RAW SIMD TIME: 2.183706ms //Using custom solution where __m128 was directly used New RAW SIMD Solution TIME: 1.502607ms //Original attempt, used loaded data from a Vector3/Matrix44 and stored the result back into a Vector3 Original SIMD Solution TIME: 5.034964ms Code used in case anyone is interested Looking at the test times @Hodgman is 100% right. Loading data into the SIMD registries and then getting it back out completely out weighs the benefit of the SIMD fast calculations. This can also be seen in the DirectX Math test where I use the XMFloat3 / XMMatrix4x4 types as these need to be loaded/stored. I have a questions about this later down the line SIMD operations are insanely fast. When running in release mode the timing on the RAW SIMD tests can't even register (0ms). I can bump the loop up to simulate over 100 million vector transformations against a matrix and it still comes as 0ms on the timer. So you can really do some serious work if you directly use the SIMD __m128 type and do not load/unload things often Now this brings me back to my questions about DirectX Math and how to use the lib. According to the MSDN DirectXMath guide they say the XMVECTOR and XMMATRIX types are the work horses for the DirectXMath Library. Which makes total sense, but then they go to say Which I understand, but I guess I'm not really sure what is expected in the overloaded new/delete/new[]/delete[]. I just know that doing: class Sprite { Sprite(){} ~Sprite(){} XMVECTOR position; XMVECTOR texCoords; XMVECTOR color; }; Sprite* mySprite = new Sprite; Is going to mess up the alignment and make SIMD operations take a performance hit Then they go on to say And that's where I get thrown off Am I normally supposed to be using the XMFLOAT[n] / XMMatrix[n]x[m] types? Based on the above statement it sounds like I should, but that does not make sense to me if I want to take advantage of SIMD operations. As having to load/unload data causes a major performance hit making the timings often worse then using normal math operations Also I noticed during my tests and this maybe my fault, but it seems like I have to transpose the matrix before multiplying it by the vector to get the correct vector result when using DirectXMath. Is this normal? //Multiplying matrix by vec should get me the result vector of 46, 118, 190, 262 //But this only happens if I transpose the matrix first //If I DO NOT transpose the matrix first I get the result vector of 130, 148, 166, 184 which is wrong? DirectX::XMVECTOR vec = DirectX::XMVectorSet(2.0f, 5.0f, 10.0f, 1.0f); DirectX::XMMATRIX mat = { 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f, 10.0f, 11.0f, 12.0f, 13.0f, 14.0f, 15.0f, 16.0f, }; mat = DirectX::XMMatrixTranspose(mat); DirectX::XMVECTOR r = DirectX::XMVector3Transform(vec, mat);
  6. I'm not using DirectXMath and my system currently does not use SIMD BUT I did go back and I made a SIMD test. Where I timed multiplying a Matrix4x4 by a Vector3 using SIMD operations and then timed the same thing using normal math operations. I tried to mimic what my 10K Sprite test is doing, so I run the Matrix4x4 * Vector3 operation 4 times and then repeat this 10K times. The weird thing is the SIMD method runs a little slower then the normal math operations. I really would have thought it would have been the other way around. Test Results: //Debug Mode SIMD TIME: 5.204931ms NORM TIME: 4.222079ms //Release Mode SIMD TIME: 0.300521ms NORM TIME: 0.242634ms This is my complete test:
  7. So I went back and did some refactoring. Changes: Setup an index buffer that is flagged immutable and is filled at the renderer's init Reduced the number of vertex calculations I have to do because of the index buffer usage I moved my mvpConstBuffer map/unmap to only do at the start of my SpriteRenderer::render method I only a map/unmap my vertex buffer once per Draw call instead of every time I add a sprite Current render time for doing 10K 64x64 textured sprites over 2 Draw calls (I limited the vertex buffer to only hold 5K worth of sprite data) is ~22.5ms. So its a improvement, but still not that great Didn't even know this was a thing. According to the MSDN docs this is enabled by default for debug mode. For some reason I can't get it to shutoff (assuming its on still). I tried putting _ITERATOR_DEBUG_LEVEL 0 in the preprocessor settings of my visual studio project properties, but it seems to make no difference for debug build. If I try putting it at the top of my main .CPP I get conflicting errors where it says _ITERATOR_DEBUG_LEVEL 0 does not match _ITERATOR_DEBUG_LEVEL 2 for ****.obj Regardless, running in Debug Mode I get the ~22.5ms for 10K sprites. Running in Release Mode I get ~1.5ms - ~4.3ms for 10K sprites. If I bump up the sprite count to 125K for Release Mode my render method time is ~19.8ms - ~24.75ms Even when I look the Release Mode numbers I still think thats pretty low... I went back and starting timing each step like you said and this is when I'm looking at: For 10K sprites. Debug mode Render method: ~22.5ms addSpriteToVertexBuffer: ~0.002053 per sprite. Total time for the 10K: ~20.53ms Setting the sprite data/vertices (Done inside addSpriteToVertexBuffer): ~0.000411ms per sprite. Total time for the 10K: ~4.11ms Matrix matrix calculations (Done inside addSpriteToVertexBuffer): ~0.001232 per sprite. Total time for the 10K: ~12.32ms A decent chunk of my time is spent doing Matrix calculations, but I really don't know how I can get the timing down on those. The profiler that comes with visual studio seems to back this up too. Just in case anyone is curious here are the profiler's reports: 10KSpritesPerfReports.zip I also noticed something kind of weird while timing the individual steps. For example when timing the matrix calculation block my application's window would be blank. Not blank like I saw the clear color, but blank like I only saw the window background. Once all my debug statements were done printing I saw my sprites. I think this has something to do with the way that I am printing out the debug statements. Currently I'm allocating a console at the start of the application, so I have the Win32 app and then I have a traditional console, then to print any debug statements I just do std::cout << myDebugStatementAsString << std::endl; So it got me thinking, could this also be crippling everything? How should I normally print out debug information? Trying to grasp at anything here
  8. So here is the code for the sprite renderer in full. Here is the code for my Sprite Renderer. Also here is a quick look at the flushvertexbuffer void SpriteRenderer::flushVertexBuffer() { if (vertexCountToDraw == 0) return; D3D11_MAPPED_SUBRESOURCE resource = mvpConstBuffer->map(D3D11_MAP_WRITE_DISCARD); memcpy(resource.pData, projectionMatrix.getData(), sizeof(Matrix4)); mvpConstBuffer->unmap(); //Draw the sprites that we need to graphicsDevice->getDeviceContext()->Draw(vertexCountToDraw, vertexCountDrawnOffset); vertexCountDrawnOffset += vertexCountToDraw; vertexCountToDraw = 0; ++drawCallCount; } I agree with this. Most of my CPU crunching comes from this method. Based on the performance tests done with the profiler in visual studio and manually confirming it this seems to be the case I'm not really sure how dynamic sprites should be handled other then this way. I have thought about having the sprite data be precomuted in the Sprite class itself and then only updated when its needed. Then my addToVertexBuffer(Sprite* sprite) method would just becomes a simple copy method to place the sprites that should display into the vertex buffer
  9. So I'm not sure if I did this right but I went back and looked my CPU with the graph set to show logical processors Based on this it looks like my CPU is crunching hard Didn't know this was a thing. This is awesome. I feel like this confirms its my CPU. I'm not sure if I can set it to look at one run of a function or not but I set the "look at" frame as small as I can. Which is about 44ms Looking at this a hugh chunk of time is spent in the addVertexToBuffer method. Which makes sense since this method is ran once per sprite. This method basically sets up the sprite, in here we are doing things like setting positions for the vertices, checking what tex coords to map, and etc. Honestly I don't know if this is the right place for this. I feel like the code in here might be better suited in the actual sprite class inset of having to redo it every frame for each sprite. Moving it to the Sprite class itself this stuff could be "precomputed" and the addToVertexBuffer would just be a data copy method void SpriteRenderer::addToVertexBuffer(Sprite* sprite) { Texture* spriteTexture = sprite->getTexture(); if (spriteTexture != boundTexture) { flushVertexBuffer(); bindTexture(spriteTexture); } if (vertexCountInBuffer == MAX_VERTEX_COUNT_FOR_BUFFER) { flushVertexBuffer(); vertexCountInBuffer = 0; vertexCountDrawnOffset = 0; vertexBufferMapType = D3D11_MAP_WRITE_DISCARD; } float width; float height; float u = 0.0f; float v = 0.0f; float uWidth = 1.0f; float vHeight = 1.0f; float textureWidth = (float)spriteTexture->getWidth(); float textureHeight = (float)spriteTexture->getHeight(); SpriteVertex verts[6]; Rect* rect = sprite->getTextureClippingRectangle(); if (rect == nullptr) { width = textureWidth / 2.0f; height = textureHeight / 2.0f; } else { width = rect->width / 2.0f; height = rect->height / 2.0f; u = rect->x / textureWidth; v = rect->y / textureHeight; uWidth = (rect->x + rect->width) / textureWidth; vHeight = (rect->y + rect->height) / textureHeight; } verts[0].position.setXYZ(-width, -height, 0.0f); verts[1].position.setXYZ(width, height, 0.0f); verts[2].position.setXYZ(width, -height, 0.0f); verts[3].position.setXYZ(-width, -height, 0.0f); verts[4].position.setXYZ(-width, height, 0.0f); verts[5].position.setXYZ(width, height, 0.0f); if (sprite->isFlipped() == false) { verts[0].texCoords.setXY(u, vHeight); verts[1].texCoords.setXY(uWidth, v); verts[2].texCoords.setXY(uWidth, vHeight); verts[3].texCoords.setXY(u, vHeight); verts[4].texCoords.setXY(u, v); verts[5].texCoords.setXY(uWidth, v); } else { verts[0].texCoords.setXY(uWidth, vHeight); verts[1].texCoords.setXY(u, v); verts[2].texCoords.setXY(u, vHeight); verts[3].texCoords.setXY(uWidth, vHeight); verts[4].texCoords.setXY(uWidth, v); verts[5].texCoords.setXY(u, v); } verts[0].color.setRGB(0.0f, 0.0f, 0.0f); verts[1].color.setRGB(0.0f, 0.0f, 0.0f); verts[2].color.setRGB(0.0f, 0.0f, 0.0f); verts[3].color.setRGB(0.0f, 0.0f, 0.0f); verts[4].color.setRGB(0.0f, 0.0f, 0.0f); verts[5].color.setRGB(0.0f, 0.0f, 0.0f); //Pre transform the positions Matrix4 model = sprite->getModelMatrix(); verts[0].position = model * verts[0].position; verts[1].position = model * verts[1].position; verts[2].position = model * verts[2].position; verts[3].position = model * verts[3].position; verts[4].position = model * verts[4].position; verts[5].position = model * verts[5].position; D3D11_MAPPED_SUBRESOURCE resource = vertexBuffer->map(vertexBufferMapType); memcpy(((SpriteVertex*)resource.pData) + vertexCountInBuffer, verts, BYTES_PER_SPRITE); vertexBuffer->unmap(); vertexCountToDraw += VERTEX_PER_QUAD; vertexCountInBuffer += VERTEX_PER_QUAD; vertexBufferMapType = D3D11_MAP_WRITE_NO_OVERWRITE; } After this most of my time is spent doing the matrix multiplications. By just commenting out the code that does this: //Pre transform the positions Matrix4 model = sprite->getModelMatrix(); verts[0].position = model * verts[0].position; verts[1].position = model * verts[1].position; verts[2].position = model * verts[2].position; verts[3].position = model * verts[3].position; verts[4].position = model * verts[4].position; verts[5].position = model * verts[5].position; My SpriteRenderer::render method time drops down to ~30ms. So its not crazy great but its still a pretty solid drop. So using a index buffer to cut out 2 of those pre-transformations might help too. Also sprite->getModelMatrix() under the hood is really doing transform * rotation * scale. So thats another set of matrix multiplications. I wonder if I should just recreate the matrix based on position, rotation, and scale vectors as it might be less math in the end You are looking at that correctly and I 100% agree. I have actually done a quick test, where I place everything into a normal array first and then do a Map() > memcpy() from the normal array to the vertex buffer > Unmap() only when I am about to do a Draw call. By doing this it makes my SpriteRenderer::render method drop down to only taking ~25ms. Definitely a change that I need to make Oh no, the sprites are all created at the beginning of the application's start and added to the render list. Once that is done thats it. Nothing is added or removed during the test. flushVertexBuffer is where my Draw call takes place. It basically updates the constant buffer for the MVP matrix and the does the draw call. Drawing as much data as I need (which in this case is all 10K sprites). Then any counters needed to determine where we are in the buffer or where to draw from, or etc are updated/reset Sorry for the post walls, just trying to cover everything
  10. Thanks for all the responses! Tried to cover everything, let me know if I missed something Not sure how helpful this is, but looking at my task manager its says: CPU: ~21% (Amount used by my application. Not total CPU usage) GPU 0 [Intel HD Graphics]: ~11% GPU 1 [NVidia GeForce GTX 850M]: ~18% This is rendering 10K sprites with a 64x64 texture in a 800x600 window So I don't think this is exactly what you mean, but speaking from a map/unmap stand point if I move things around and only map once per draw call my time goes down to 25ms. To do this I created an intermediate array that is the same size as my vertex buffer. Then I place my sprite data into this intermediate array, when I need to draw I just do a memcpy straight into the vertex buffer //Created at Sprite Renderer init vertices = new SpriteVertex[MAX_VERTEX_COUNT_FOR_BUFFER]; //In side of my function that flushes the buffer resource = vertexBuffer->map(vertexBufferMapType); memcpy(resource.pData, vertices, vertexCountInBuffer * sizeof(SpriteVertex)); vertexBuffer->unmap(); graphicsDevice->getDeviceContext()->Draw(vertexCountToDraw, vertexCountDrawnOffset); Currently my SpriteVertex class is using a float3 for the position on the CPU side. class SpriteVertex { public: SpriteVertex(); SpriteVertex(Vector3 position, Vector2 texCoords, Color color); ~SpriteVertex(); Vector3 position; Vector2 texCoords; Color color; }; On the shader side I have it as float4 because of the MVP matrix. Changing the position float3 (shader side) makes the window just show red. I assume I'm super zoomed into the sprites or something. I removed the unneeded input.position.w = 1.0f though Currently I have no index buffer setup, so I will have to go back and try this out. I do believe this would help a little bit in the very least, because you are right I would do less matrix calculations this way So if I comment out the Draw call I still have ~40ms. If I also take out the map/unmap calls I get around ~36ms. So there is a minor different but I'm starting to think my CPU is the issue. The 40ms time is just the cost of doing the render, so this is just the Draw and unmap/map calls. When I time this function I'm doing it like so: void SpriteRenderer::render(double deltaTime) { //Get the start time QueryPerformanceCounter(&startTime); renderStart(); //Setup/reset since other renderes may have ran. Only this renderer is running sortRenderList(); //This is only done once. On the first frame. Only sorting by texture too Sprite* sprite = nullptr; for (std::vector<Sprite*>::iterator i = renderList.begin(); i != renderList.end(); ++i) { sprite = (*i); if (sprite->isVisible() == false) continue; //Put the sprite into the buffer. This is where the map/unmap calls are addToVertexBuffer(sprite); } //Draw the sprites that were placed in the buffer. Draw call is here flushVertexBuffer(); //Get the end time and calculate how long it took QueryPerformanceCounter(&endTime); Logger::info("RENDER TIME: " + std::to_string(((endTime.QuadPart - startTime.QuadPart) * 1000) / frq.QuadPart)); } void SpriteRenderer::addToVertexBuffer(Sprite* sprite) { Texture* spriteTexture = sprite->getTexture(); if (spriteTexture != boundTexture) { flushVertexBuffer(); bindTexture(spriteTexture); } if (vertexCountInBuffer == MAX_VERTEX_COUNT_FOR_BUFFER) { flushVertexBuffer(); vertexCountInBuffer = 0; vertexCountDrawnOffset = 0; vertexBufferMapType = D3D11_MAP_WRITE_DISCARD; } /* Code to setup the sprite. Vertex transform, flipping, applying texture clip rect, etc */ //Put the sprite in the buffer D3D11_MAPPED_SUBRESOURCE resource = vertexBuffer->map(vertexBufferMapType); memcpy(((SpriteVertex*)resource.pData) + vertexCountInBuffer, verts, BYTES_PER_SPRITE); vertexBuffer->unmap(); vertexCountToDraw += VERTEX_PER_QUAD; vertexCountInBuffer += VERTEX_PER_QUAD; vertexBufferMapType = D3D11_MAP_WRITE_NO_OVERWRITE; } void SpriteRenderer::renderStart() { graphicsDevice = GraphicsDeviceModule::getGraphicsDevice(); graphicsDevice->getDeviceContext()->VSSetShader(defaultVertexShader->getShader(), 0, 0); graphicsDevice->getDeviceContext()->VSSetConstantBuffers(0, 1, mvpConstBuffer->getBuffer()); graphicsDevice->getDeviceContext()->PSSetShader(defaultPixelShader->getShader(), 0, 0); graphicsDevice->getDeviceContext()->IASetInputLayout(inputLayout->getInputLayout()); graphicsDevice->getDeviceContext()->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST); graphicsDevice->getDeviceContext()->IASetVertexBuffers(0, 1, vertexBuffer->getBuffer(), &STRIDE_PER_VERTEX, &VERTEX_BUFFER_OFFSET); boundTexture = nullptr; } Now the one thing I'm not sure about is that when I time like the above (using the QueryPerformanceCounter) am I really timing my methods calls or am I timing how long they take to return. This probably makes more sense with timing something like the Present timer QueryPerformanceCounter(&startTime); GraphicsDeviceModule::getGraphicsDevice()->present(); QueryPerformanceCounter(&endTime); Logger::info("PRESENT TIME: " + std::to_string(((endTime.QuadPart - startTime.QuadPart) * 1000) / frq.QuadPart)); Did I just time how long it really takes to present everything to the screen or did I just time how long it took to post the command to the GPU? I think I'm timing the how long it takes to return since my time comes back as 0ms
  11. I just finished up my 1st iteration of my sprite renderer and I'm sort of questioning its performance. Currently, I am trying to render 10K worth of 64x64 textured sprites in a 800x600 window. These sprites all using the same texture, vertex shader, and pixel shader. There is basically no state changes. The sprite renderer itself is dynamic using the D3D11_MAP_WRITE_NO_OVERWRITE then D3D11_MAP_WRITE_DISCARD when the vertex buffer is full. The buffer is large enough to hold all 10K sprites and execute them in a single draw call. Cutting the buffer size down to only being able to fit 1000 sprites before a draw call is executed does not seem to matter / improve performance. When I clock the time it takes to complete the render method for my sprite renderer (the only renderer that is running) I'm getting about 40ms. Aside from trying to adjust the size of the vertex buffer, I have tried using 1x1 texture and making the window smaller (640x480) as quick and dirty check to see if the GPU was the bottleneck, but I still get 40ms with both of those cases. I'm kind of at a loss. What are some of the ways that I could figure out where my bottleneck is? I feel like only being able to render 10K sprites is really low, but I'm not sure. I'm not sure if I coded a poor renderer and there is a bottleneck somewhere or I'm being limited by my hardware Just some other info: Dev PC specs: GPU: Intel HD Graphics 4600 / Nvidia GTX 850M (Nvidia is set to be the preferred GPU in the Nvida control panel. Vsync is set to off) CPU: Intel Core i7-4710HQ @ 2.5GHz Renderer: //The renderer has a working depth buffer //Sprites have matrices that are precomputed. These pretransformed vertices are placed into the buffer Matrix4 model = sprite->getModelMatrix(); verts[0].position = model * verts[0].position; verts[1].position = model * verts[1].position; verts[2].position = model * verts[2].position; verts[3].position = model * verts[3].position; verts[4].position = model * verts[4].position; verts[5].position = model * verts[5].position; //Vertex buffer is flaged for dynamic use vertexBuffer = BufferModule::createVertexBuffer(D3D11_USAGE_DYNAMIC, D3D11_CPU_ACCESS_WRITE, sizeof(SpriteVertex) * MAX_VERTEX_COUNT_FOR_BUFFER); //The vertex buffer is mapped to when adding a sprite to the buffer //vertexBufferMapType could be D3D11_MAP_WRITE_NO_OVERWRITE or D3D11_MAP_WRITE_DISCARD depending on the data already in the vertex buffer D3D11_MAPPED_SUBRESOURCE resource = vertexBuffer->map(vertexBufferMapType); memcpy(((SpriteVertex*)resource.pData) + vertexCountInBuffer, verts, BYTES_PER_SPRITE); vertexBuffer->unmap(); //The constant buffer used for the MVP matrix is updated once per draw call D3D11_MAPPED_SUBRESOURCE resource = mvpConstBuffer->map(D3D11_MAP_WRITE_DISCARD); memcpy(resource.pData, projectionMatrix.getData(), sizeof(Matrix4)); mvpConstBuffer->unmap(); Vertex / Pixel Shader: cbuffer mvpBuffer : register(b0) { matrix mvp; } struct VertexInput { float4 position : POSITION; float2 texCoords : TEXCOORD0; float4 color : COLOR; }; struct PixelInput { float4 position : SV_POSITION; float2 texCoords : TEXCOORD0; float4 color : COLOR; }; PixelInput VSMain(VertexInput input) { input.position.w = 1.0f; PixelInput output; output.position = mul(mvp, input.position); output.texCoords = input.texCoords; output.color = input.color; return output; } Texture2D shaderTexture; SamplerState samplerType; float4 PSMain(PixelInput input) : SV_TARGET { float4 textureColor = shaderTexture.Sample(samplerType, input.texCoords); return textureColor; } If anymore info is needed feel free to ask, I would really like to know how I can improve this assuming I'm not hardware limited
  12. OpenGL Sprite batch rendering

    While I have not work in OpenGL in while I know you can do this in a couple of ways The best way is to definitely batch up your sprites together by things like textures, shaders, and other state changes. Also mapping directly into the buffer will help. There is no reason to place all your sprite data into some intermediate structure like a vector just to copy it all out into your vertex buffer later if you can help it Since your sprites are dynamic you could do one of the following: 1. You could Ping Pong 2 buffers. Basically you create 2 vertex buffers and swap between them. One is used as the drawing buffer and while this buffer is in use you place your sprite data in the other one. Break down: Draw using Buffer A, while the GPU is drawing from Buffer A you fill Buffer B with new sprite data. Then swap the buffers (Buffer B becomes the draw buffer, Buffer A becomes the data fill buffer) when you need to draw the data in Buffer B 2. You can Orphan the buffer. Basically you pass NULL into glBufferData and this tells the driver to give you a fresh block of memory to use. This also allows the GPU work with the previously issued commands and memory while you fill the new block that was handed to you Here is some more info on this Using one draw call per sprite is really going to kill performance, especially if you are looking to render lots and lots of sprites on the screen at one time. You can definitely pretransform your sprite's vertices using the sprite's world (model) matrix and then place those pretransformed vertices into your vertex buffer. Once the vertex buffer is full or there is some kind of state change (texture change, shader, etc) flush the buffer and issue a draw call. The less draw calls better for performance
  13. I am currently working on my first iteration of my sprite renderer and I'm trying to draw 2 sprites. They both use the same texture and are placed into the same buffer, but unfortunately only the second sprite is shown on the the screen. I assume I messed something up when I place them into the buffer and that I am overwriting the data of the first sprite. So how should I be mapping my buffer with an offset? /* Code that sets up the sprite vertices and etc */ D3D11_MAPPED_SUBRESOURCE resource = vertexBuffer->map(vertexBufferMapType); memcpy(resource.pData, verts, sizeof(SpriteVertex) * VERTEX_PER_QUAD); vertexBuffer->unmap(); vertexCount += VERTEX_PER_QUAD; I feel like I should be doing something like: /* Code that sets up the sprite vertices and etc */ D3D11_MAPPED_SUBRESOURCE resource = vertexBuffer->map(vertexBufferMapType); //Place the sprite vertex data into the pData using the current vertex count as offset //The code resource.pData[vertexCount] is syntatically wrong though :( Not sure how it should look since pData is void pointer memcpy(resource.pData[vertexCount], verts, sizeof(SpriteVertex) * VERTEX_PER_QUAD); vertexBuffer->unmap(); vertexCount += VERTEX_PER_QUAD; Also speaking of offsets can someone give an example of when the pOffsets param for the IASetVertexBuffers call would not be 0
  14. Thanks, I'll check it out! I don't think really every intended to scale the shear/skew factor, but its just how the math ended up working out when I multiplied my data matrix vs the scale matrix
  15. I'm reviewing a tutorial on using textures and I see that the vertex shader has this input declaration where the position is float4 struct VertexInputType { float4 position : POSITION; float2 tex : TEXCOORD0; }; But when they go over uploading the data to vertex buffer they only use a float3 (Vector3) value for the position // Load the vertex array with data. vertices[0].position = D3DXVECTOR3(-1.0f, -1.0f, 0.0f); // Bottom left. vertices[0].texture = D3DXVECTOR2(0.0f, 1.0f); vertices[1].position = D3DXVECTOR3(0.0f, 1.0f, 0.0f); // Top middle. vertices[1].texture = D3DXVECTOR2(0.5f, 0.0f); vertices[2].position = D3DXVECTOR3(1.0f, -1.0f, 0.0f); // Bottom right. vertices[2].texture = D3DXVECTOR2(1.0f, 1.0f); The input layout description declared also seems to match to use a float3 value polygonLayout[0].SemanticName = "POSITION"; polygonLayout[0].SemanticIndex = 0; polygonLayout[0].Format = DXGI_FORMAT_R32G32B32_FLOAT; polygonLayout[0].InputSlot = 0; polygonLayout[0].AlignedByteOffset = 0; polygonLayout[0].InputSlotClass = D3D11_INPUT_PER_VERTEX_DATA; polygonLayout[0].InstanceDataStepRate = 0; So does this mean that shaders will automatically default "missing" values to 0 or something of the like? If so is this frowned upon?
  • Advertisement