Jump to content

  • Log In with Google      Sign In   
  • Create Account


L. Spiro

Member Since 29 Oct 2003
Online Last Active Today, 04:10 AM
****-

#5147596 Optimising my renderer

Posted by L. Spiro on Today, 04:06 AM

Your last point is an interesting one though. I have read a lot today about drawing sprites in one draw call, but I haven't seen how this is actually achieved. So, I have absolutely no Idea how this can be done.

I would be extremely appreciative if you could shed light on this for me

It was a bit implicit in previous posts.


#1: Create vertex buffer. Not static/read-only. Dynamic.
EACH FRAME
- #2: Lock it.
- #3: Fill it with the sprite vertices. Drawing 32 sprites means you put 32×4 vertices into the buffer. Obviously you will have to transform them on the CPU by the sprite’s position, rotation (only if applicable), and scale (only if applicable).
- #4: Unlock it.
- #5: Draw it using the pre-generated 16-bit index buffer.


As I mentioned, these vertex buffers should be double- or even triple- buffered and swapped each frame.


L. Spiro


#5147567 Optimising my renderer

Posted by L. Spiro on Today, 12:50 AM

#1: I can’t tell if you are using shaders. If not, use shaders.
#2: The vertex shader does not need a whole 4×4 matrix to do what it needs to do; it only needs a single vector with the normalized screen dimensions. This reduces bandwidth when updating uniforms.
#3: Don’t use sprites.
#4: Do use 2 vertex buffers and 1 pre-generated max-filled index buffer (see Ashaman73’s code above).
- #A: The index buffer should be 16 bits.
- #B: The vertex buffers should be double-buffered. Never overwrite part of a vertex buffer immediately after drawing it. If you have to write more than MAX_SPRITES_PER_BUFFER (borrowing from Ashaman73’s code) in a single frame, then you need to write to more than one buffer that frame (let’s say 3), than write to a new set of 3 buffers the next frame. But generally MAX_SPRITES_PER_BUFFER should be set high enough that it never gets overflowed in a single frame and you bounce back and forth between only 2 buffers each frame.



Your biggest bottleneck is how you are managing your vertex buffer.
There is extensive reading material on best practices when updating a vertex buffer.
http://msdn.microsoft.com/en-us/library/windows/desktop/bb147263(v=vs.85).aspx#Using_Dynamic_Vertex_and_Index_Buffers


L. Spiro


#5146139 Static as substitute of private members.

Posted by L. Spiro on 10 April 2014 - 07:26 PM

I explain my use of statics here: http://lspiroengine.com/?p=570
And because I dislike statics and tend to avoid them myself, anyone who disagrees with my use there is perfectly in the right.  A graphics module doesn’t need to be global, but I also strongly dislike having to pass an instance down to every graphics resource such as a vertex buffer.  I also disagree with a graphics object that spits out vertex buffers, textures, etc., which would solve that problem, but breaks the single-responsibility rule in my opinion; I prefer a vertex buffer/texture/index buffer/shader/etc. that can simply be made just like any other object and it knows by itself how to allocate its resources and activate them.
 
All that being said, I finally decided to use static for that small section of the engine because statics are not always evil, you just have to use them responsibly.
http://www.gamedev.net/topic/647440-why-are-static-variables-bad/
The downsides of statics are explained there and above.
As I use them sparingly and after much consideration, I do not have a problem with any of these downsides.


L. Spiro


#5146134 Run the same OpenGL program in two context

Posted by L. Spiro on 10 April 2014 - 07:10 PM

This prevents any sort of multi threading since you cant:
 
a. Do GL calls against a single context in multiple threads.


 
So if you want to draw in two different windows, AFAIK, you'll have to use different contexts and replicate the work from one to the other. Or you can handle everything in a single application that draws to two different places in the same window and context.

This is not correct.
You were correct when you stated that a single context can be used on a single thread at a time.
Which means you can make calls on a single context from multiple threads. You have to manage synchronization manually since you can’t access a context simultaneously, but you can switch the context to different threads and use it on each thread one-at-a-time.

So if you want to render into multiple windows (which does not necessarily imply they are running on different threads, but let’s say they are), make a single context, set it active on thread A, render into window A, make it active on thread B. and render into window B.


A context can be active on any thread, but only one thread at a time.
It is painful to manually juggle the context with critical sections etc. but it works perfectly fine.


L. Spiro


#5146130 Static as substitute of private members.

Posted by L. Spiro on 10 April 2014 - 06:58 PM

Static members are global, global is bad.

But the main problem is that static members are not instance members.  You decide to use static when there will only be one member for all instances of that class globally, not based on visible scope.

 

 

Static is not a replacement for private in C.  Declaring and defining them only in the .C file is.

 

 

L. Spiro




#5145850 [Opengl] Sorting vertices( further to nearest)

Posted by L. Spiro on 09 April 2014 - 11:13 PM

The need to sort triangles is extremely rare, and considering Counter Strike doesn’t do it at all indicates you are just approaching the problem wrong.

 

First-off, you need to draw opaque triangles and translucent triangles in separate passes.  Draw opaque first, disable depth writing, enable blending, and draw the translucent triangles.

You don’t sort the triangles, you sort the objects back-to-front for the translucent pass.

 

If you feel the need to sort triangles, you are probably doing it wrong, and you need to provide a screenshot as proof after implementing a proper render loop.

 

 

L. Spiro




#5145775 Max buffer size

Posted by L. Spiro on 09 April 2014 - 03:25 PM

Check the MaxPrimitiveCount property of the device caps.  It will typically be 1,048,576.

Given your index count, you are likely drawing over this limit.  After checking, you can easily see if this is the problem by manually capping the number of primitives you draw in a single call to less than or equal to that value and see if you get anything.

 

 

L. Spiro




#5145697 I need help with matrices

Posted by L. Spiro on 09 April 2014 - 09:46 AM

Because your input had no meaning in the context of the conversation.
You said you use floats.
Vector3D uses floats.
Quaternion uses floats.
Matrix uses floats.

In other words, everyone uses floats.
Your reply is nonsensical at best, detrimental at worst to anyone who understands it to mean he or she should replace a Vector3D with 3 floats. That is absolutely 100% not the way it “seems to go”.

That is absolutely terrible advice.
What sane person would do that?
Vectors provide encapsulation, simplify the code, and make its intent clearer.

With 3 separate floats you have basically a pool of ungrouped floats rather than a clear separation between rotation values, position values, and scale values.

With 3 separate floats you have to do this:
object->addPos( off_x, off_y, off_z );
instead of this:
object->pos() += off;
Vector3D operators make manipulating code easier and faster.

What if one day you want to add SIMD optimizations to vector operations? Good luck if you just use raw floats everywhere.


That is why I down-voted you. And trust me, as rare as it is that I down-vote, you should take it as a serious hint.


L. Spiro


#5145692 Why a "Game" class?

Posted by L. Spiro on 09 April 2014 - 09:35 AM

Once the engine works like this, there is no more of these problems.

You do not have the skill level necessary to evaluate that.
For example, you have hidden dependencies (as you admit later), but you seem to think you don’t.
 

The point for me is that i have all data / pointers in 1 file

That is disgusting.
 

so i dont have to hunt everything down

If you can’t remember where things are in your engine you lack a very basic skill.
 

when cleaning the engine.

Except you have a bunch of globals over a small amount of files; by definition it isn’t “clean”, it is disgusting.
 

professors can discuss another decade about it, but it wont convince me.

Being stubborn and unwilling to learn/grow/adapt is not a characteristic worthy of bragging.
You’ve basically just said, “I learned the wrong way and I am stuck in it because it is my wrong way.”

I don’t personally care because you have basically implied that you will luckily never be working on real production code. It’s your career you are flushing, but for the rest of us it’s probably better that you do so. Imaging one of us having to maintain your code in the future. What a nightmare.



As for the game class, there is no reason to make it global, but the scope of it as viewed by myself is different from that of frob’s.
Although it is instance-based and there can be as many as you like in theory, actually a game class does not represent something you would destroy and recreate on a “New Game” screen. The game is running even if you are on the New Game screen, the title screen, credits, etc.
If you want multiple games running at a time it is possible, but the duration of a game class is from the “power on” to “power off” on a console.

Each screen, such as the menu screens, credits, gameplay area, bonus rounds, etc., are just states within the game.
You can limit yourself to one game state active at a time in a game and allow many games to run at the same time, or make one game instance and allow many states to run at the time time, but ultimately you can do all the same unit-testing as described by frob, letting AI’s go at each other repeatedly for days on end in multiple instances simultaneously etc.


As shown in my articles the game class does not need to be global because it is passed to states and already easy to access at any time.
From it you can get general information about the game plus per-game custom information programmers add when they inherit from CGame. As you would expect, this includes high scores, game settings, etc.


L. Spiro


#5145525 Eliminating OpenGL/DirectX differences

Posted by L. Spiro on 08 April 2014 - 06:55 PM

However, this was a total quess (I'm still quite surprised it works), and there are still occasional problems, like right now I'm trying to solve a complete f***-up in my cascaded shadow maps in opengl.

Probably because OpenGL normalizes to the depth range -1 to 1 whereas Direct3D uses NDC’s from 0 to 1.
Not only does this mess up shadows, you are likely only using half the range of the depth buffer with the way you are creating your projection matrix.


L. Spiro


#5144501 raw input mouse problem

Posted by L. Spiro on 04 April 2014 - 09:35 PM

Ok I will share what I have found so far. If I move my mouse 1 pixel in any direction while my left button is clicked it stays that way. So if I move in x+ direction x becomes 1 (-1 for x- and so on) and stays that way until I move my mouse again or I release mouse button.

The very first reply already correctly answered your question.
If you aren’t moving the mouse or clicking any buttons, you aren’t generating any events and GetRawInputData() is not being called.

The solution is fairly obvious: Don’t move your mouse in the game (or process mouse events in the game, whatever) unless an actual call to GetRawInputData() was made.


L. Spiro


#5144261 Oh no, not this topic again: LH vs. RH.

Posted by L. Spiro on 03 April 2014 - 11:36 PM

In the book chapters I gave you (for OpenGL ES) you can see that I originally described row-major and column-major as being how the data is stored in RAM (which aligns with everything you have said here).

While writing the next chapter I happened upon this.

 

The description for GLKMatrix4MakeTranslation() leaves no room for error, but I tested it by creating a matrix with it and then stopping it in the debugger to see the actual RAM layout.  It’s “row major” according to the Wikipedia link, and matches the exact memory layout of both my engine and Direct3D.

Yet when I called GLKMatrix4Multiply(), I get the correct C only if I call it as GLKMatrix4Multiply( B, A ), whereas in Direct3D and in my engine I have to call MatMul( A, B ) to get the same C.

 

 

So there is definitely a discrepancy, and it is not just about how it is laid out in RAM.

 

 

Even though GLKit is free to deviate from the OpenGL specification, it proves that RAM layout is not what decides A × B vs. B × A.

 

 

Additionally, the OpenGL API may be internally transposing it before storing it.  Because OpenGL uses column-major notation, they may have made a public API that take data that looks column-major in RAM but internally transpose it.

I have yet to confirm this, but I may over the weekend by using MHS and looking at the actual RAM inside the OpenGL DLL.

 

However in the case of GLKit on iOS, I viewed the RAM in a debugger already and verified that it writes to physical RAM in row-major (Direct3D style) order.

It’s verified that the RAM matches Direct3D’s to the byte yet uses post-multiplication vs. Direct3D’s pre-multiplication.

 

 

 

It is a fairly confusing topic, but I have confirmed for-sure through GLKit on iOS that the memory layout is not related to calling it “row major” or “column-major”.  The memory layout is the same, but GLKit is designed to access the matrix in a transposed way.

 

Frankly I am no longer sure what to put in my book because Wikipedia and my past self claim it is up to how it is laid out in RAM whereas GLKit contradicts that.

It’s an iOS book so…

 

 

L. Spiro




#5144246 Suggestions for my Input Manager.

Posted by L. Spiro on 03 April 2014 - 09:56 PM

You are just a bit too fast; within the next 2 weeks I plan to release an article on handling input from an implementation standpoint.  It’s much more complex than you think.

 

That article is good for an overall idea of what features need to be part of the overall input system, but I believe its step #2 is or at least could be construed as being akin to an event system.  Inputs are not events.  While it is true they are to be collected and remapped, they are not to be processed immediately as they are received (as an event system would have you do).

 

Input is to be accumulated on the input thread, timestamped, and fed to the game logic at a specific time during the overall logic processing.

 

 

So I am going to suggest you stop here on 2 points:

#1: If you are using singletons, you are doing it wrong.

#2: Inputs are not events.  You don’t need an answer to your posted question because you don’t need events and handlers.  Read my posts for (many) details on why this is and try to get inspiration for another approach to this problem.

 

I will answer questions related to the input system I described already and very soon I will post the promised article on an input system that addresses the biggest issues: Input lag and timely processing.

 

 

L. Spiro




#5144033 Render Target isn't rendering scene objects for post processing effects.

Posted by L. Spiro on 03 April 2014 - 01:20 AM

Do you have a depth buffer for that render target? Is depth-testing off?
What does PIX/Visual Studio 2013/your graphics-debugger-of-choice tell you?


L. Spiro


#5143996 What is BoneIndex, DXGI_FORMAT ?

Posted by L. Spiro on 02 April 2014 - 07:29 PM

It is really quite simple.

// == COMPONENTS

R = x[0]

G = x[1]

B = x[2]

A = x[3]

 

 

// == BITS PER COMPONENT

8 = bits in x

 

 

// == SIGN

U = Unsigned (byte)

 

 

// == RANGE

NORM = converted to float and normalized to 0.0-1.0

 

 

You have float[4].  float is 32 bits.  That’s R32G32B32A32.  Why would you expect R8G8B8A8 to work if you are using R32G32B32A32 (float[4]) for your client data?

It doesn’t make sense.  Why would yours work?

 

 

float[4] = R32G32B32A32

byte[4] = R8G8B8A8

 

So if you are going to send DXGI_FORMAT_R8G8B8A8_UNORM, you need to convert your float[4] into a byte[4].

byte[0] = float[0] * 255 + 0.5
byte[1] = float[1] * 255 + 0.5
byte[2] = float[2] * 255 + 0.5
byte[3] = float[3] * 255 + 0.5

 

 

L. Spiro






PARTNERS