• 13
• 27
• 9
• 9
• 20
• Similar Content

• By Orella
I'm having problems rotating GameObjects in my engine. I'm trying to rotate in 2 ways.
I'm using MathGeoLib to calculate maths in the engine.
First way: Rotates correctly around axis but if I want to rotate back, if I don't do it following the inverse order then rotation doesn't work properly.
e.g:
Rotate X axis 50 degrees, Rotate Y axis 30 degrees -> Rotate Y axis -50 degrees, Rotate X axis -30 degrees. Works.
Rotate X axis 50 degrees, Rotate Y axis 30 degrees -> Rotate X axis -50 degrees, Rotate Y axis -30 degrees. Doesn't.

Code:
void ComponentTransform::SetRotation(float3 euler_rotation) { float3 diff = euler_rotation - editor_rotation; editor_rotation = euler_rotation; math::Quat mod = math::Quat::FromEulerXYZ(diff.x * DEGTORAD, diff.y * DEGTORAD, diff.z * DEGTORAD); quat_rotation = quat_rotation * mod; UpdateMatrix();  } Second way: Starts rotating good around axis but after rotating some times, then it stops to rotate correctly around axis, but if I rotate it back regardless of the rotation order it works, not like the first way.

Code:
void ComponentTransform::SetRotation(float3 euler_rotation) { editor_rotation = euler_rotation; quat_rotation = math::Quat::FromEulerXYZ(euler_rotation.x * DEGTORAD, euler_rotation.y * DEGTORAD, euler_rotation.z * DEGTORAD); UpdateMatrix();  }
Rest of code:
#define DEGTORAD 0.0174532925199432957f void ComponentTransform::UpdateMatrix() { if (!this->GetGameObject()->IsParent()) { //Get parent transform component ComponentTransform* parent_transform = (ComponentTransform*)this->GetGameObject()->GetParent()->GetComponent(Component::CompTransform); //Create matrix from position, rotation(quaternion) and scale transform_matrix = math::float4x4::FromTRS(position, quat_rotation, scale); //Multiply the object transform by parent transform transform_matrix = parent_transform->transform_matrix * transform_matrix; //If object have childs, call this function in childs objects for (std::list<GameObject*>::iterator it = this->GetGameObject()->childs.begin(); it != this->GetGameObject()->childs.end(); it++) { ComponentTransform* child_transform = (ComponentTransform*)(*it)->GetComponent(Component::CompTransform); child_transform->UpdateMatrix(); } } else { //Create matrix from position, rotation(quaternion) and scale transform_matrix = math::float4x4::FromTRS(position, quat_rotation, scale); //If object have childs, call this function in childs objects for (std::list<GameObject*>::iterator it = this->GetGameObject()->childs.begin(); it != this->GetGameObject()->childs.end(); it++) { ComponentTransform* child_transform = (ComponentTransform*)(*it)->GetComponent(Component::CompTransform); child_transform->UpdateMatrix(); } } } MathGeoLib: Quat MUST_USE_RESULT Quat::FromEulerXYZ(float x, float y, float z) { return (Quat::RotateX(x) * Quat::RotateY(y) * Quat::RotateZ(z)).Normalized(); } Quat MUST_USE_RESULT Quat::RotateX(float angle) { return Quat(float3(1,0,0), angle); } Quat MUST_USE_RESULT Quat::RotateY(float angle) { return Quat(float3(0,1,0), angle); } Quat MUST_USE_RESULT Quat::RotateZ(float angle) { return Quat(float3(0,0,1), angle); } Quat(const float3 &rotationAxis, float rotationAngleRadians) { SetFromAxisAngle(rotationAxis, rotationAngleRadians); } void Quat::SetFromAxisAngle(const float3 &axis, float angle) { assume1(axis.IsNormalized(), axis); assume1(MATH_NS::IsFinite(angle), angle); float sinz, cosz; SinCos(angle*0.5f, sinz, cosz); x = axis.x * sinz; y = axis.y * sinz; z = axis.z * sinz; w = cosz; } Any help?
Thanks.
• By owenjr
Hi there!
I am trying to implement a basic AI for a Turrets game in SFML and C++ and I have some problems.
This AI follows some waypoints stablished in a Bezier Courve.
In first place, this path was followed only by one enemy. For this purpose, the enemy has to calculate his distance between his actual position
to the next waypoint he has to pick.
If the distance is less than a specific value we stablish, then, we get to the next point. This will repeat until the final destination is reached. (in the submitting code, forget about the var m_go)

Okay, our problem gets when we spawn several enemies and all have to follow the same path, because it produces a bad visual effect (everyone gets upside another).
In order to solve this visual problem, we have decided to use a repulsion vector. The calculus gets like this:

As you can see, we calculate the repulsion vector with the inverse of the distance between the enemy and his nearest neighbor.
Then, we get it applying this to the "theorical" direction, by adding it, and we get a resultant, which is the direction that
our enemy has to follow to not "collide" with it's neighbors. But, our issue comes here:

The enemys get sepparated in the middle of the curve and, as we spawn more enemys, the speed of all of them increases dramatically (including the enemies that don't calculate the repuslion vector).
1 - Is it usual that this sepparation occours in the middle of the trajectory?
2 - Is it there a way to control this direction without the speed getting affected?
3 - Is it there any alternative to this theory?

I submit the code below (There is a variable in Spanish [resultante] which it means resultant in English):

if (!m_pathCompleted) { if (m_currentWP == 14 && m_cambio == true) { m_currentWP = 0; m_path = m_pathA; m_cambio = false; } if (m_neighbors.size() > 1) { for (int i = 0; i < m_neighbors.size(); i++) { if (m_enemyId != m_neighbors[i]->GetId()) { float l_nvx = m_neighbors[i]->GetSprite().getPosition().x - m_enemySprite.getPosition().x; float l_nvy = m_neighbors[i]->GetSprite().getPosition().y - m_enemySprite.getPosition().y; float distance = std::sqrt(l_nvx * l_nvx + l_nvy * l_nvy); if (distance < MINIMUM_NEIGHBOR_DISTANCE) { l_nvx *= -1; l_nvy *= -1; float l_vx = m_path[m_currentWP].x - m_enemySprite.getPosition().x; float l_vy = m_path[m_currentWP].y - m_enemySprite.getPosition().y; float l_resultanteX = l_nvx + l_vx; float l_resultanteY = l_nvy + l_vy; float l_waypointDistance = std::sqrt(l_resultanteX * l_resultanteX + l_resultanteY * l_resultanteY); if (l_waypointDistance < MINIMUM_WAYPOINT_DISTANCE) { if (m_currentWP == m_path.size() - 1) { std::cout << "\n"; std::cout << "[GAME OVER]" << std::endl; m_go = false; m_pathCompleted = true; } else { m_currentWP++; } } if (l_waypointDistance > MINIMUM_WAYPOINT_DISTANCE) { l_resultanteX = l_resultanteX / l_waypointDistance; l_resultanteY = l_resultanteY / l_waypointDistance; m_enemySprite.move(ENEMY_SPEED * l_resultanteX * dt, ENEMY_SPEED * l_resultanteY * dt); } } else { float vx = m_path[m_currentWP].x - m_enemySprite.getPosition().x; float vy = m_path[m_currentWP].y - m_enemySprite.getPosition().y; float len = std::sqrt(vx * vx + vy * vy); if (len < MINIMUM_WAYPOINT_DISTANCE) { if (m_currentWP == m_path.size() - 1) { std::cout << "\n"; std::cout << "[GAME OVER]" << std::endl; m_go = false; m_pathCompleted = true; } else { m_currentWP++; } } if (len > MINIMUM_WAYPOINT_DISTANCE) { vx = vx / len; vy = vy / len; m_enemySprite.move(ENEMY_SPEED * vx * dt, ENEMY_SPEED * vy * dt); } } } } } else { float vx = m_path[m_currentWP].x - m_enemySprite.getPosition().x; float vy = m_path[m_currentWP].y - m_enemySprite.getPosition().y; float len = std::sqrt(vx * vx + vy * vy); if (len < MINIMUM_WAYPOINT_DISTANCE) { if (m_currentWP == m_path.size() - 1) { std::cout << "\n"; std::cout << "[GAME OVER]" << std::endl; m_go = false; m_pathCompleted = true; } else { m_currentWP++; } } if (len > MINIMUM_WAYPOINT_DISTANCE) { vx = vx / len; vy = vy / len; m_enemySprite.move(ENEMY_SPEED * vx * dt, ENEMY_SPEED * vy * dt); } } }
¡¡Thank you very much in advance!!

• Overview
Welcome to the 2D UFO game guide using the Orx Portable Game Engine. My aim for this tutorial is to take you through all the steps to build a UFO game from scratch.
The aim of our game is to allow the player to control a UFO by applying physical forces to move it around. The player must collect pickups to increase their score to win.
I should openly acknowledge that this series is cheekily inspired by the 2D UFO tutorial written for Unity.
It makes an excellent comparison of the approaches between Orx and Unity. It is also a perfect way to highlight one of the major parts that makes Orx unique among other game engines, its Data Driven Configuration System.
You'll get very familiar with this system very soon. It's at the very heart of just about every game written using Orx.
If you are very new to game development, don't worry. We'll take it nice and slow and try to explain everything in very simple terms. The only knowledge you will need is some simple C++.
I'd like say a huge thank you to FullyBugged for providing the graphics for this series of articles.

What are we making?
Visit the video below to see the look and gameplay of the final game:
Getting Orx
The latest up to date version of Orx can be cloned from github and set up with:
git clone https://github.com/orx/orx.git After cloning, an $ORX environment variable will be created automatically for your system which will help with making game projects much easier. It will also create several IDE projects for your operating system: Visual Studio, Codelite, Code::Blocks, and gmake. These Orx projects will allow you to compile the Orx library for use in your own projects. And the$ORX environment variable means that your projects will know where to find the Orx library.
For more details on this step, visit http://orx-project.org/wiki/en/tutorials/cloning_orx_from_github at the Orx learning wiki.
Setting up a 2D UFO Project
Now the you have the Orx libraries cloned and compiled, you will need a blank project for your game. Supported options are: Visual Studio, CodeLite, Code::Blocks, XCode or gmake, depending on your operating system.
Once you have a game project, you can use it to work through the steps in this tutorial.
Orx provides a very nice system for auto creating game projects for you. In the root of the Orx repo, you will find either the init.bat (for Windows) or init.sh (Mac/Linux) command.
Create a project for our 2D game from the command line in the Orx folder and running:
init c:\temp\ufo or
init.sh ~/ufo Orx will create a project for each IDE supported by your OS at the specified location. You can copy this folder anywhere, and your project will always compile and link due to the \$ORX environment variable. It knows where the libraries and includes are for Orx.
Open your project using your favourite IDE from within the ufo/build folder.
When the blank template loads, there are two main folders to note in your solution:
config src Firstly, the src folder contains a single source file, ufo.cpp. This is where we will add the c++ code for the game. The config folder contains configuration files for our game.
What is config?
Orx is a data driven 2D game engine. Many of the elements in your game, like objects, spawners, music etc, do not need to be defined in code. They can be defined (or configured) using config files.
You can make a range of complex multi-part objects with special behaviours and effects in Orx, and bring them into your game with a single line of code. You'll see this in the following chapters of this guide.
There are three ufo config files in the config folder but for this guide, only one will actually be used in our game. This is:
ufo.ini All our game configuration will be done there.
Over in the Orx library repo folder under orx/code/bin, there are two other config files:
CreationTemplate.ini SettingsTemplate.ini These are example configs and they list all the properties and values that are available to you. We will mainly concentrate on referring to the CreationTemplate.ini, which is for objects, sounds, etc. It's good idea to include these two files into your project for easy reference.
Alternatively you can view these online at https://github.com/orx/orx/blob/master/code/bin/CreationTemplate.ini and here: https://github.com/orx/orx/blob/master/code/bin/SettingsTemplate.ini

The code template
Now to take a look at the basic ufo.cpp and see what is contained there.
The first function is the Init() function.
This function will execute when the game starts up. Here you can create objects have been defined in the config, or perform other set up tasks like handlers. We'll do both of these soon.
The Run() function is executed every main clock cycle. This is a good place to continually perform a task. Though there are better alternatives for this, and we will cover those later. This is mainly used to check for the quit key.
The Exit() function is where memory is cleaned up when your game quits. Orx cleans up nicely after itself. We won't use this function as part of this guide.
The Bootstrap() function is an optional function to use. This is used to tell Orx where to find the first config file for use in our game (ufo.ini). There is another way to do this, but for now, we'll use this function to inform Orx of the config.
Then of course, the main() function. We do not need to use this function in this guide.
Now that we have everything we need to get start, you should be able to compile successfully. Run the program and an Orx logo will appear slowly rotating.

Great. So now you have everything you need to start building the UFO game.

Setting up the game assets
Our game will have a background, a UFO which the player will control, and some pickups that the player can collect.
The UFO will be controlled by the player using the cursor keys.
First you'll need the assets to make the game. You can download the file  assets-for-orx-ufo-game.zip which contains:
The background file (background.png):

The UFO and Pickup sprite images (ufo.png and pickup.png):

And a pickup sound effect (pickup.ogg):
pickup.ogg
Copy the .png files into your data/texture folder
Copy the .ogg file into your data/sound folder.
Now these files can be accessed by your project and included in the game.

Setting up the Playfield
We will start by setting up the background object. This is done using config.
Open the ufo.ini config file in your editor and add the following:

[BackgroundGraphic] Texture = background.png Pivot = center
The BackgroundGraphic defined here is called a Graphic Section. It has two properties defined. The first is Texture which has been set as background.png.
The Orx library knows where to find this image, due to the properties set in the Resource section:

[Resource] Texture = ../../data/texture
So any texture files that are required (just like in our BackgroundGraphic section) will be located in the ../../data/texture folder.
The second parameter is Pivot. A pivot is the handle (or sometimes “hotspot” in other frameworks). This is set to be center. The position is 0,0 by default, just like the camera. The effect is to ensure the background sits in the center of our game window.
There are other values available for Pivot. To see the list of values, open the CreationTemplate.ini file in your editor. Scroll to the GraphicTemplate section and find Pivot in the list. There you can see all the possible values that could be used.
top left is also a typical value.
We need to define an object that will make use of this graphic. This will be the actual entity that is used in the game:

[BackgroundObject] Graphic = BackgroundGraphic Position = (0, 0, 0)
The Graphic property is the section BackgroundGraphic that we defined earlier. Our object will use that graphic.
The second property is the Position. In our world, this object will be created at (0, 0, 0). In Orx, the coordinates are (x, y, z). It may seem strange that Orx, being a 2D game engine has a Z axis. Actually Orx is 2.5D. It respects the Z axis for objects, and can use this for layering above or below other objects in the game.
To make the object appear in our game, we will add a line of code in our source file to create it.
In the Init() function of ufo.cpp, remove the default line:
orxObject_CreateFromConfig("Object"); and replace it with:
orxObject_CreateFromConfig("BackgroundObject"); Compile and run.
The old spinning logo is now replaced with a nice tiled background object.

Next, the ufo object is required. This is what the player will control. This will be covered in Part 2.
• By yyam

Hey there! I released my game, Hedgehogs Can Fly, on GameJolt today! It's a cute, 2D physics-platformer where you try to fling a hedgehog through tricky levels to get to the finish line. I wrote it from scratch in C++ with SFML. There are multiple types of terrain each with different properties and effects making for some interesting level design. The physics/level code also allows for free-form terrain that isn't constrained to a tile grid. The levels are loaded from color-coded image files, I have an entry in the devlog on the GameJolt page explaining how it all works!
If these screenshots look cool, visit the GameJolt page here (With Trailer Video!)
Screenshots incoming...

Have a nice day

• Hi Guys, really pleased to announce that my 5 part series on creating a 2D UFO game using the Orx Portable Game Engine has been published in the articles section.

It then takes you step by step through numerous topics:
Creating a playfield The ufo movement Keyboard controls Collisions Physics Scores Sounds and; Shadows The series starts over here:
How to write a 2D UFO game using the Orx Portable Game Engine - Part 1
If you find any problems, or enjoyed going through it, I'd love to hear about it.
Graphics for the article series were kindly designed by my friend FullyBugged.

DX11 Trying to finding bottlenecks in my renderer

Recommended Posts

I just finished up my 1st iteration of my sprite renderer and I'm sort of questioning its performance.

Currently, I am trying to render 10K worth of 64x64 textured sprites in a 800x600 window. These sprites all using the same texture, vertex shader, and pixel shader. There is basically no state changes. The sprite renderer itself is dynamic using the D3D11_MAP_WRITE_NO_OVERWRITE then D3D11_MAP_WRITE_DISCARD when the vertex buffer is full. The buffer is large enough to hold all 10K sprites and execute them in a single draw call. Cutting the buffer size down to only being able to fit 1000 sprites before a draw call is executed does not seem to matter / improve performance.  When I clock the time it takes to complete the render method for my sprite renderer (the only renderer that is running) I'm getting about 40ms. Aside from trying to adjust the size of the vertex buffer, I have tried using 1x1 texture and making the window smaller (640x480) as quick and dirty check to see if the GPU was the bottleneck, but I still get 40ms with both of those cases.

I'm kind of at a loss. What are some of the ways that I could figure out where my bottleneck is?
I feel like only being able to render 10K sprites is really low, but I'm not sure. I'm not sure if I coded a poor renderer and there is a bottleneck somewhere or I'm being limited by my hardware

Just some other info:

Dev PC specs:

GPU: Intel HD Graphics 4600 / Nvidia GTX 850M (Nvidia is set to be the preferred GPU in the Nvida control panel. Vsync is set to off)
CPU: Intel Core i7-4710HQ @ 2.5GHz

Renderer:

//The renderer has a working depth buffer

//Sprites have matrices that are precomputed. These pretransformed vertices are placed into the buffer
Matrix4 model = sprite->getModelMatrix();
verts[0].position = model * verts[0].position;
verts[1].position = model * verts[1].position;
verts[2].position = model * verts[2].position;
verts[3].position = model * verts[3].position;
verts[4].position = model * verts[4].position;
verts[5].position = model * verts[5].position;

//Vertex buffer is flaged for dynamic use
vertexBuffer = BufferModule::createVertexBuffer(D3D11_USAGE_DYNAMIC, D3D11_CPU_ACCESS_WRITE, sizeof(SpriteVertex) * MAX_VERTEX_COUNT_FOR_BUFFER);

//The vertex buffer is mapped to when adding a sprite to the buffer
//vertexBufferMapType could be D3D11_MAP_WRITE_NO_OVERWRITE or D3D11_MAP_WRITE_DISCARD depending on the data already in the vertex buffer
D3D11_MAPPED_SUBRESOURCE resource = vertexBuffer->map(vertexBufferMapType);
memcpy(((SpriteVertex*)resource.pData) + vertexCountInBuffer, verts, BYTES_PER_SPRITE);
vertexBuffer->unmap();

//The constant buffer used for the MVP matrix is updated once per draw call
memcpy(resource.pData, projectionMatrix.getData(), sizeof(Matrix4));
mvpConstBuffer->unmap();

cbuffer mvpBuffer : register(b0)
{
matrix mvp;
}

struct VertexInput
{
float4 position : POSITION;
float2 texCoords : TEXCOORD0;
float4 color : COLOR;
};

struct PixelInput
{
float4 position : SV_POSITION;
float2 texCoords : TEXCOORD0;
float4 color : COLOR;
};

PixelInput VSMain(VertexInput input)
{
input.position.w = 1.0f;

PixelInput output;
output.position = mul(mvp, input.position);
output.texCoords = input.texCoords;
output.color = input.color;

return output;
}

SamplerState samplerType;
float4 PSMain(PixelInput input) : SV_TARGET
{

return textureColor;
}

If anymore info is needed feel free to ask, I would really like to know how I can improve this assuming I'm not hardware limited

Share on other sites
2 hours ago, noodleBowl said:

Nvidia is set to be the preferred GPU in the Nvida control panel.

Add this to some non-empty .cpp file (if I put them in an empty .cpp file, it seems to be ignored) to automatically chose the dedicated instead of integrated GPU:

extern "C" {
__declspec(dllexport) DWORD NvOptimusEnablement;
}
extern "C" {
__declspec(dllexport) int AmdPowerXpressRequestHighPerformance;
}

This will avoid changing the Nvidia control panel for all your different builds.

Edited by matt77hias

Share on other sites

What happens if you don't update sprite vertices per frame? (I assume uploading that much data is the bottleneck, you may consider uploading the transforms instead, which would be 4 values per sprite instead 6 * 4.)

Edit: Additionally you probably should use double buffering or a ring buffer to allow some frames of latency for the GPU, if you don't already.

I tried something similar with Vulkan and Fury GPU:

Render 2 million textured boxes, vertex.w = integer index to pick the proper 4*4 matrix from a regular buffer (not uniform as usual) -> 80 fps.

I do not remember if this number was with or without per frame upload, probably without, but the upload was definitively the bottleneck, especially because i did not use double buffering IIRC.

Edited by JoeJ

Share on other sites
6 hours ago, noodleBowl said:

input.position.w = 1.0f;

Just use a float3 Position instead of float4 Position, you will get the w coordinate of 1.0f for free. Furthermore, it does not make sense to use a float4 and to immediately overwrite the w coordinate. Just use an explicit float3 to inform the compiler.

6 hours ago, noodleBowl said:

Matrix4 model = sprite->getModelMatrix(); verts[0].position = model * verts[0].position; verts[1].position = model * verts[1].position; verts[2].position = model * verts[2].position; verts[3].position = model * verts[3].position; verts[4].position = model * verts[4].position; verts[5].position = model * verts[5].position;

A sprite is basically a quad consisting of two triangles. You can reuse the position of the shared vertices. This will reduce the number of matrix multiplications by 1/3.

6 hours ago, noodleBowl said:

When I clock the time it takes to complete the render method for my sprite renderer (the only renderer that is running) I'm getting about 40ms. Aside from trying to adjust the size of the vertex buffer, I have tried using 1x1 texture and making the window smaller (640x480) as quick and dirty check to see if the GPU was the bottleneck, but I still get 40ms with both of those cases.

If you skip the draw, do you still have +- 40ms? If this is the case, skip the map/unmaps as well. If you still have 40ms, your CPU is definitely the culprit (and not the code that you are showing).

Edited by matt77hias

Share on other sites
7 hours ago, Michael Aganier said:

It's almost 2018. Update your Windows 10 and your Task Manager will be able to show GPU usage % and GPU memory usage.

Share on other sites
13 hours ago, noodleBowl said:

When I clock the time it takes to complete the render method for my sprite renderer (the only renderer that is running) I'm getting about 40ms.

Is that just the map, memcpy, unmap shown above? Or does it involve drawing / Present too?

Add more detail to the timing - see if you can find which specific function is using most of that time. Also measure how long Present is taking.

Share on other sites

Thanks for all the responses! Tried to cover everything, let me know if I missed something

20 hours ago, Michael Aganier said:

12 hours ago, Zaoshi Kaba said:

It's almost 2018. Update your Windows 10 and your Task Manager will be able to show GPU usage % and GPU memory usage.

Not sure how helpful this is, but looking at my task manager its says:

CPU: ~21% (Amount used by my application. Not total CPU usage)
GPU 0 [Intel HD Graphics]: ~11%
GPU 1 [NVidia GeForce GTX 850M]: ~18%

This is rendering 10K sprites with a 64x64 texture in a 800x600 window

14 hours ago, JoeJ said:

What happens if you don't update sprite vertices per frame? (I assume uploading that much data is the bottleneck, you may consider uploading the transforms instead, which would be 4 values per sprite instead 6 * 4.)

So I don't think this is exactly what you mean, but speaking from a map/unmap stand point if I move things around and only map once per draw call my time goes down to 25ms. To do this I created an intermediate array that is the same size as my vertex buffer. Then I place my sprite data into this intermediate array, when I need to draw I just do a memcpy straight into the vertex buffer

//Created at Sprite Renderer init
vertices = new SpriteVertex[MAX_VERTEX_COUNT_FOR_BUFFER];

//In side of my function that flushes the buffer
resource = vertexBuffer->map(vertexBufferMapType);
memcpy(resource.pData, vertices, vertexCountInBuffer * sizeof(SpriteVertex));
vertexBuffer->unmap();

graphicsDevice->getDeviceContext()->Draw(vertexCountToDraw, vertexCountDrawnOffset);

14 hours ago, matt77hias said:

Just use a float3 Position instead of float4 Position, you will get the w coordinate of 1.0f for free. Furthermore, it does not make sense to use a float4 and to immediately overwrite the w coordinate. Just use an explicit float3 to inform the compiler.

Currently my SpriteVertex class is using a float3 for the position on the CPU side.

class SpriteVertex
{

public:
SpriteVertex();
SpriteVertex(Vector3 position, Vector2 texCoords, Color color);
~SpriteVertex();
Vector3 position;
Vector2 texCoords;
Color color;
};

On the shader side I have it as float4 because of the MVP matrix. Changing the position float3 (shader side) makes the window just show red. I assume I'm super zoomed into the sprites or something. I removed the unneeded input.position.w = 1.0f though

14 hours ago, matt77hias said:

A sprite is basically a quad consisting of two triangles. You can reuse the position of the shared vertices. This will reduce the number of matrix multiplications by 1/3.

Currently I have no index buffer setup, so I will have to go back and try this out. I do believe this would help a little bit in the very least, because you are right I would do less matrix calculations this way

14 hours ago, matt77hias said:

If you skip the draw, do you still have +- 40ms? If this is the case, skip the map/unmaps as well. If you still have 40ms, your CPU is definitely the culprit (and not the code that you are showing).

So if I comment out the Draw call I still have ~40ms. If I also take out the map/unmap calls I get around ~36ms. So there is a minor different but I'm starting to think my CPU is the issue.

8 hours ago, Hodgman said:

Is that just the map, memcpy, unmap shown above? Or does it involve drawing / Present too?

Add more detail to the timing - see if you can find which specific function is using most of that time. Also measure how long Present is taking.

The 40ms time is just the cost of doing the render, so this is just the Draw and unmap/map calls. When I time this function I'm doing it like so:

void SpriteRenderer::render(double deltaTime)
{
//Get the start time
QueryPerformanceCounter(&startTime);

renderStart(); //Setup/reset since other renderes may have ran. Only this renderer is running
sortRenderList(); //This is only done once. On the first frame. Only sorting by texture too

Sprite* sprite = nullptr;
for (std::vector<Sprite*>::iterator i = renderList.begin(); i != renderList.end(); ++i)
{
sprite = (*i);
if (sprite->isVisible() == false)
continue;

//Put the sprite into the buffer. This is where the map/unmap calls are
}

//Draw the sprites that were placed in the buffer. Draw call is here
flushVertexBuffer();

//Get the end time and calculate how long it took
QueryPerformanceCounter(&endTime);

}

{
Texture* spriteTexture = sprite->getTexture();
if (spriteTexture != boundTexture)
{
flushVertexBuffer();
bindTexture(spriteTexture);
}

if (vertexCountInBuffer == MAX_VERTEX_COUNT_FOR_BUFFER)
{
flushVertexBuffer();
vertexCountInBuffer = 0;
vertexCountDrawnOffset = 0;
}

/* Code to setup the sprite. Vertex transform, flipping, applying texture clip rect, etc */

//Put the sprite in the buffer
D3D11_MAPPED_SUBRESOURCE resource = vertexBuffer->map(vertexBufferMapType);
memcpy(((SpriteVertex*)resource.pData) + vertexCountInBuffer, verts, BYTES_PER_SPRITE);
vertexBuffer->unmap();

vertexBufferMapType = D3D11_MAP_WRITE_NO_OVERWRITE;
}

void SpriteRenderer::renderStart()
{
graphicsDevice = GraphicsDeviceModule::getGraphicsDevice();
graphicsDevice->getDeviceContext()->VSSetConstantBuffers(0, 1, mvpConstBuffer->getBuffer());
graphicsDevice->getDeviceContext()->IASetInputLayout(inputLayout->getInputLayout());
graphicsDevice->getDeviceContext()->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
graphicsDevice->getDeviceContext()->IASetVertexBuffers(0, 1, vertexBuffer->getBuffer(), &STRIDE_PER_VERTEX, &VERTEX_BUFFER_OFFSET);

boundTexture = nullptr;
}

Now the one thing I'm not sure about is that when I time like the above (using the QueryPerformanceCounter) am I really timing my methods calls or am I timing how long they take to return. This probably makes more sense with timing something like the Present timer

QueryPerformanceCounter(&startTime);
GraphicsDeviceModule::getGraphicsDevice()->present();
QueryPerformanceCounter(&endTime);
Logger::info("PRESENT TIME: " + std::to_string(((endTime.QuadPart - startTime.QuadPart) * 1000) / frq.QuadPart));

Did I just time how long it really takes to present everything to the screen or did I just time how long it took to post the command to the GPU? I think I'm timing the how long it takes to return since my time comes back as 0ms

Edited by noodleBowl

Share on other sites
1 hour ago, noodleBowl said:

Not sure how helpful this is, but looking at my task manager its says:


CPU: ~21% (Amount used by my application. Not total CPU usage)
GPU 0 [Intel HD Graphics]: ~11%
GPU 1 [NVidia GeForce GTX 850M]: ~18%

You have to look at individual cores, but if you have your total CPU at 21%, it means one of the cores might be running at 100% which inflates the average.

The GPU usage is not important because we are not measuring the performance of the GPU. We are measuring the performance of your renderer to prepare instructions on the CPU.

Knowing the usage of the rendering thread is important because If it is at 100%, it means that the GPU is waiting for more instructions because you're not sending them fast enough or in a such a way that the GPU can parallelize them. If this is the case, you have a 100% confirmation that the problem is your renderer.

23 hours ago, noodleBowl said:

What are some of the ways that I could figure out where my bottleneck is?

Share on other sites

BTW if you use Visual Studio, you can use the built-in profiler. This will give you a rough idea of the methods taking most of the time. Furthermore, they do not use your timer. So you can rule out the issues you think to have with your timer.