Jump to content
  • Advertisement


  • Content Count

  • Joined

  • Last visited

Community Reputation

1076 Excellent

About FantasyVII

  • Rank

Personal Information

  • Role
  • Interests

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. alright, So I have been doing research about Data-Oriented design and ECS and how contiguous memory is important and how stalling the CPU because of a cache miss is really bad when you have a lot of objects to update or render So I have this UML diagram that I created it is simple enough, I have a position component that only deals with positions, a transform system that iterate through a vector and update the positions. All that data is contiguous. The total size of the position component 20 bytes, assuming we have a 64 bytes cache line that means we can fit 4 of them in a single cache line before we have a cache miss. Now e also have a sprite component that has color of the sprite, a pointer to the texture and a pointer to the gameObject where it came from. The sprite component is also contiguous in memory and each sprite is 28 bytes, which means we can fit two sprites on a single cache line before we have a cache miss. Finally, our GameObject which hold a position component to know where to render the sprite on the screen and a sprite component to know the sprite texture and color. Now the sprite renderer will need to access the position component from the game object to know where to render the sprite on the screen. However, if we assume we have 1 million sprites that means have 1 million game objects and 1 million position components. Now every time the sprite renderer iterate through the sprite it has to make a memory jump to the position component to get the position. since the position component is not contiguous in memory with the sprite it means it has to go to main memory to read that data which would be a big issue which means we have a cache miss. My question is, how do I reduce the number of cache misses I have here? How can I improve this ECS system?
  2. MapBuffer in Debug build with 1 mill sprites is much worse, I get around 4 frames. Where in the release I get 16 frames. I have to modify the sprites every single frame, yes. All 1 million of them. I did optimize out std::vector<T>::size() and few little things, and in the grande scheme of things they only improve my frame rate by 1 frame. For my index buffer, I use GL_STATIC_DRAW since I don't need to modify the index buffer after creation at all. I allocate the buffer and fill it up once. For my vertex buffer, the creation flags are GLCall(glBufferStorage(GL_ARRAY_BUFFER, size, NULL, GL_DYNAMIC_STORAGE_BIT | GL_MAP_WRITE_BIT | GL_MAP_READ_BIT | GL_MAP_PERSISTENT_BIT | GL_MAP_COHERENT_BIT)); for mapping I use GLCall(return glMapBufferRange(GL_ARRAY_BUFFER, 0, size, GL_MAP_WRITE_BIT | GL_MAP_READ_BIT | GL_MAP_PERSISTENT_BIT | GL_MAP_COHERENT_BIT)); I don't handle any synchronization between the CPU and GPU since from my understanding GL_MAP_PERSISTENT_BIT takes care of that. I haven't done any double or triple buffering. I do want to do that though, but I am not sure how to go about it. I don't know how to implement it. I need to do a bit more research about that. Here is the blog post http://voidptr.io/blog/2016/04/28/ldEngine-Part-1.html resolution is not the issue here. I know for a fact that I can get 1 million quads rendered without using instancing. although I will implement instancing later on. For now I need to solve the CPU bottleneck that I have I suspected it would be cache miss issue. I thought vector of pointers will have the data contiguous in memory. I was wrong. That only happens if you have a normal vector of objects. I verified that. After changing this vector from this std::vector<IRenderable*> renderables; to this std::vector<RectangleShape> rectangleShapes; my frame rate went from 16 frames to 23 frames. So I gained 7 frames. That is great. Thank you for the tip!!! Still, though my CPU is bottlenecked. I suspect that my CPU is still stalling because I have a pointer to my transform component in my RectangleShape class. I will move my position data from my transform component class to the sprite itself and test it and see how many frames I gain. My usage on the curly brace went from 69% to now it is at 56% But there is still an issue. Currently, my renderer can render the following things, -LineShape -RectangleShape -Sprite -Text They all inherit from the base class Renderable. I need to have a vector of pointers. I can't do it as a vector of objects. the reason for that is I also sort all my renderable based on their zSortingOrder. So I can't have 4 different vectors of objects for LineShape, RectangleShape, Sprite, and Text. So that is one issue. However, I do need to keep my data contiguous in memory. As far as I know, std::vector<pointer*> has no way of making this happen (correct me if I am wrong) so that leaves me to having to create my own custom allocator. Which is going to be a pain because I am also using std::sort(). If I create my custom allocator I need to create my own implementation of quicksort to sort my renderable. unless I can somehow use std::sort(). Anyway, so far I know one of my performance issues are that my data is not contiguous in memory and that is killing my performance. I will reimplement my engine API to account for that. I will have a look at them. Thank you! I only map the buffer once and then I use it. So mapping is not the thing that is killing my performance.
  3. Yes, each sprite is unique, each has its own position, color, and texture. The issue with Instance Rendering is, as far as I know, all objects have to have the same mesh, and texture. Either way, I will implement Instance Rendering at some point. But for now, I just don't understand why the CPU takes such a long time to prepare the 1mill sprites. This is basically what I am doing. I do a single draw call for all 1 million sprites. I map a big buffer that holds all the vertices for all the sprites and I modify them on the CPU using persistent buffer mapping. I found this blog post about the subject, and this guy was able to render 1 million quads at 60 frames with persistent buffer mapping. So I must be doing something wrong, which I don't understand. The profiler is no help.
  4. Hi everyone, I am attempting to render 1,000,000 sprites on my screen at 60 FPS. I am using OpenGL persistent mapping. At first, I was using glMapBuffer to do this and I was getting around 32 frames. After using glMapBufferRange / persistent mapping I started getting 34 frames. I have been trying to profile my code and figure out what is going on, and for the life of me, I can't. I know my GPU is just sitting mostly idle. So it is a CPU bottleneck. I fired up VS 2019 CPU profiler and tried to see why my CPU bottlenecked and I can't figure it out. All I know is that my CPU is spending around 70% of its time in the function that maps the sprite I am building this in release x64 bit mode What am I supposed to do with the information that my CPU is spending 69% of its time on the opening curly brace?? My render loop is simple enough and it looks like this #define BFE_MAX_SPRITES 1000000 #define BFE_SPRITE_VERTICES 4 #define BFE_SPRITE_INDICES 6 #define BFE_VERTICES_SIZE BFE_MAX_SPRITES * BFE_SPRITE_VERTICES #define BFE_INDICES_SIZE BFE_MAX_SPRITES * BFE_SPRITE_INDICES void SpriteRenderer::Initialize() { BF::Engine::GetContext().SetPrimitiveType(PrimitiveType::Triangles); shader.LoadStandardShader(ShaderType::SpriteRenderer); vertexBufferLayout.Push(0, "POSITION", VertexBufferLayout::DataType::Float2, sizeof(SpriteBuffer), 0); vertexBufferLayout.Push(1, "COLOR", VertexBufferLayout::DataType::Float4, sizeof(SpriteBuffer), sizeof(Vector2f)); vertexBufferLayout.Push(2, "TEXCOORD", VertexBufferLayout::DataType::Float2, sizeof(SpriteBuffer), sizeof(Vector2f) + sizeof(Color)); vertexBufferLayout.Push(3, "RENDERINGTYPE", VertexBufferLayout::DataType::Float, sizeof(SpriteBuffer), sizeof(Vector2f) + sizeof(Color) + sizeof(Vector2f)); unsigned int* indices = new unsigned int[BFE_INDICES_SIZE]; int index = 0; /* Winding order is clock-wise. 0 -> 1 -> 2 ---> 2 -> 3 -> 0 0 1 ______ |\ | | \ | | \ | | \ | | \ | |_____\| 3 2 */ for (unsigned int i = 0; i < BFE_INDICES_SIZE; i += BFE_SPRITE_INDICES) { indices[i + 0] = index + 0; indices[i + 1] = index + 1; indices[i + 2] = index + 2; indices[i + 3] = index + 2; indices[i + 4] = index + 3; indices[i + 5] = index + 0; index += BFE_SPRITE_VERTICES; } vertexBuffer.Create(); vertexBuffer.Allocate(BFE_VERTICES_SIZE * sizeof(SpriteBuffer), nullptr, BufferMode::PersistentMapping); ogSpriteBuffer = (SpriteBuffer*)vertexBuffer.MapPersistentStream(); spriteBuffer = ogSpriteBuffer; indexBuffer.Create(); indexBuffer.SetBuffer(indices, BFE_INDICES_SIZE, BufferMode::StaticDraw); vertexBuffer.SetLayout(shader, &vertexBufferLayout); Engine::GetContext().EnableDepthBuffer(false); Engine::GetContext().EnableBlending(true); Engine::GetContext().EnableScissor(true); delete[] indices; } void SpriteRenderer::Render() { totalDrawCalls = 0; shader.Bind(); MapBuffer(); vertexBuffer.Bind(); indexBuffer.Bind(); Engine::GetContext().Draw(indexCount); indexBuffer.Unbind(); vertexBuffer.Unbind(); totalDrawCalls++; indexCount = 0; currentBoundTexture = nullptr; spriteBuffer = ogSpriteBuffer; } void SpriteRenderer::MapBuffer() { if (submitSprite) { for (size_t i = 0; i < renderLayerManager.renderLayers.size(); i++) { for (size_t j = 0; j < renderLayerManager.renderLayers[i]->renderables.size(); j++) { MapRectangleShapeBuffer((RectangleShape*)renderLayerManager.renderLayers[i]->renderables[j]); } } } } void SpriteRenderer::MapRectangleShapeBuffer(RectangleShape* rectangleShape) { //Top Left spriteBuffer->position = rectangleShape->transfrom->corners[0]; spriteBuffer->color = rectangleShape->color; spriteBuffer->UV = Vector2f(0.0f); spriteBuffer->renderingType = 0; spriteBuffer++; //Top Right spriteBuffer->position = rectangleShape->transfrom->corners[1]; spriteBuffer->color = rectangleShape->color; spriteBuffer->UV = Vector2f(0.0f); spriteBuffer->renderingType = 0; spriteBuffer++; //Bottom Right spriteBuffer->position = rectangleShape->transfrom->corners[2]; spriteBuffer->color = rectangleShape->color; spriteBuffer->UV = Vector2f(0.0f); spriteBuffer->renderingType = 0; spriteBuffer++; //Bottom Left spriteBuffer->position = rectangleShape->transfrom->corners[3]; spriteBuffer->color = rectangleShape->color; spriteBuffer->UV = Vector2f(0.0f); spriteBuffer->renderingType = 0; spriteBuffer++; indexCount += BFE_SPRITE_INDICES; } I don't know where to go from here
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!