Jump to content

  • Log In with Google      Sign In   
  • Create Account

L. Spiro

Member Since 29 Oct 2003
Offline Last Active Today, 07:43 AM

#5308361 Faster Sin and Cos

Posted by on 28 August 2016 - 11:01 AM

If you search for these terms you will get a lengthy list of possibilities.
Look-up tables from the old id Tech days are too cache-miss heavy.
Other implementations are only faster if you are re-implementing sincos(), including the Unreal Engine 4 implementation.
Sony presents an idea using Chebyshev polynomials, but we are bound to certain degrees of accuracy based on how far out you wish to expand the Taylor series.
With the goal of making something both fast and accurate in mind, I have come up with the following functions.

float Sin( float _fX ) {
	int32 i32I = int32( _fX * 0.31830988618379067153776752674503f );	// 1 / PI.
	_fX = (_fX - float( i32I ) * 3.1415926535897932384626433832795f);

	float fX2 = _fX * _fX;

	return (i32I & 1) ?
		-_fX * (float( 1.0 ) +
			fX2 * (float( -1.66666671633720398e-01 ) +
			fX2 * (float( 8.33333376795053482e-03 ) +
			fX2 * (float( -1.98412497411482036e-04 ) +
			fX2 * (float( 2.75565571428160183e-06 ) +
			fX2 * (float( -2.50368437093584362e-08 ) +
			fX2 * (float( 1.58846852338356825e-10 ) +
			fX2 * float( -6.57978446033657960e-13 )))))))) :
		_fX * (float( 1.0 ) +
			fX2 * (float( -1.66666671633720398e-01 ) +
			fX2 * (float( 8.33333376795053482e-03 ) +
			fX2 * (float( -1.98412497411482036e-04 ) +
			fX2 * (float( 2.75565571428160183e-06 ) +
			fX2 * (float( -2.50368437093584362e-08 ) +
			fX2 * (float( 1.58846852338356825e-10 ) +
			fX2 * float( -6.57978446033657960e-13 ))))))));
float Cos( float _fX ) {
	int32 i32I = int32( _fX * 0.31830988618379067153776752674503f );	// 1 / PI.
	_fX = (_fX - float( i32I ) * 3.1415926535897932384626433832795f);

	float fX2 = _fX * _fX;

	return (i32I & 1) ?
		-float( 1.0 ) -
			fX2 * (float( -5.00000000000000000e-01 ) +
			fX2 * (float( 4.16666641831398010e-02 ) +
			fX2 * (float( -1.38888671062886715e-03 ) +
			fX2 * (float( 2.48006836045533419e-05 ) +
			fX2 * (float( -2.75369188784679864e-07 ) +
			fX2 * (float( 2.06202765973273472e-09 ) +
			fX2 * float( -9.77589970779790818e-12 ))))))) :
		float( 1.0 ) +
			fX2 * (float( -5.00000000000000000e-01 ) +
			fX2 * (float( 4.16666641831398010e-02 ) +
			fX2 * (float( -1.38888671062886715e-03 ) +
			fX2 * (float( 2.48006836045533419e-05 ) +
			fX2 * (float( -2.75369188784679864e-07 ) +
			fX2 * (float( 2.06202765973273472e-09 ) +
			fX2 * float( -9.77589970779790818e-12 )))))));

Performance on PC may vary, from 1.02 times as fast to 2.0 times as fast.
On PlayStation 4 this is around 7 or 8 time as fast.
On Xbox One this is around 2 or 3 times as fast.
Accuracy is no fewer than 6 digits. I implemented a less-accurate version on Final Fantasy XV, so these versions are entirely suitable for any AAA production.


The code starts off by using fmodf() on PI.  This is implemented manually via a cast to an integer.  This gives it a valid range of ±52,707,130.87185.


cos() and sin() are curves that go up, then down, then up, etc.  “i32I & 1” checks for it being an up curve or down curve.  i32I represents the number of PI denominators, each even going one way, each odd going another way.


Here is the fancy part.

You will notice that the magic constants start off being close to one-over each odd factorial and each even factorial.

0.16666666666666666666666666666667 = 1 / 6 (6 - 1*2*3)

0.00833333333333333333333333333333 = 1 / 120 (120 = 1*2*3*4*5)



But by the end, they drift rather significantly.

1/17! = 7.6471637318198164759011319857881e-13

I use 




The reason is that the series should normally continue on into infinity, but we cut it short.

In the case of Sin(), if we don’t account for this, our numbers drift low (because we actually use the negative of the constant and it overshoots low).

Using a lower number as I have done accounts for this.



I’ve adjusted each of the constants to account for this type of drift and give the best-possible accuracy for this degree of precision.

The precision here is enough to entirely drive a AAA game such as Final Fantasy XV, Star Ocean 5, and others.



Later I will re-evaluate the constants used in Unreal Engine 4, and then I will post a super-fast version.



L. Spiro

#5308227 What to consider for an RPG damage formula?

Posted by on 27 August 2016 - 10:26 AM

A friend of mine long ago created this guide to how Final Fantasy VII calculates damage.
I am sure you will find it more than a little useful.  :wink: 



L. Spiro

#5303596 What's Your Ios Game Loop?

Posted by on 02 August 2016 - 12:44 AM

The other drawback of fixed-step is you don't know how many update steps you may take - which scares me since doing 1 game loop vs 2 game loops could blow my CPU budget for a frame out of the water and cause us to miss a rendered frame?

If you need a cap, limit the number of updates you can do in a frame, but each update still needs to be of a fixed amount of time.
You have to be fine with having your game slow down and drop frames for this to work, which means online real-time multi-player games cannot do this, but generally iOS games can.

L. Spiro

#5303422 What's Your Ios Game Loop?

Posted by on 01 August 2016 - 02:28 AM

  • I feel like variable-time game update might be OK on mobile given we don't have to worry about being preempted as much.
It is never okay to use a variable-step rate. It is trivial to implement and failure to do so is only out of laziness, as it never enhances your game, only adds to its instability.

  • Don't know if this is how CADisplayLink is designed to work - perhaps should run game logic on background thread and only render in CADisplayLink callback? Not sure what synchronization issues arise here. Game logic could be allowed to update as fast as possible - or could cap to screen refresh rate.
  • Similarly, could push rendering into same background thread as game update. Again, not sure what synchronization issues arise here.

  • Minimize input -> display latency

Keep your game thread in a waiting state and trigger it from the display link. Spend as little time as possible in the display link or you will interfere with input timing and responsiveness.
For simplicity, there is no reason to render on a thread separately from your logic thread—just do the update and then render on the same thread.

(Placed in Graphics Programming and Theory because of CPU/GPU synchronization issues - of which I am most uninformed)

Making a separate render thread is non-trivial and requires some experience. Stick to single threads now and research multi-threaded rendering when you are ready.
It is unrelated to anything specific to iOS, and is its own subject large enough to deserve its own research.

L. Spiro

#5301699 Scene System?

Posted by on 21 July 2016 - 02:08 AM

It's job is also to handle the transformation hierarchies of objects and the camera.

This is not the scene manager’s job; the scene graph is an implicit nature of the actors in the scene.
Actors themselves can be parents and children of other actors. Actors themselves manage how a parent transform affects their own transforms, etc.

The scene manager will issue a “pre-update” call to each actor, which actors can use to calculate their final transforms taking their local transforms with their parents’ transforms, but to the scene manager this call is a black box. The actors could use it to make spaghetti for all the scene manager cares.

In addition to what has already been said, I would point out that there is no reason to restrict yourself to having only a single scene at a time.
I real-life example that I had where 2 scenes would be the best solution was in a golf game I made in which you spend most of your time on the 3D field, but for swinging the club the golf course is shrunken to fit a certain area of the screen and a 3D guy swinging a club is overlaid on top of it with a different set of projections and view transforms.

Render-queues are a good way to sort objects for rendering inside a single scene, but if we are talking about overlaying the UI, debug text, etc., then you would really want layers.
Usually engines have a fixed and predefined set of layers, each intended to render a specific part of the scene.
Usually there are about 16 layers, but most of them are blank or “reserved” for future use.
An example of a simple set of layers would be:
Layer 0: Player.
Layer 3: Solid 3D objects.
Layer 5: Terrain.
Layer 7: Skybox.
Layer 8: Translucent 3D objects.
Layer 10: Post processing.
Layer 12: UI/HUD.
Layer 15: Debug text.

Layers are drawn in order, so later layers are drawn on top of earlier layers.
The way each layer is drawn is custom. The 3D layers would use render queues, but the terrain layer would use a system specific to terrain (chunks, GeoMipmaps, etc.), and the skybox layer would just be a single standard draw call.

  • Redrawing the entire scene from scratch is very slow
Too bad. That is what you do. If it is slow, it is probably because you are using glBegin() etc., and are otherwise non-optimal.
The solution isn’t to try redrawing only what has changed, the solution is to fix the actual bottlenecks.

The idea is that the scene system would accept requests to have things drawn, either once (during the next frame), or until the drawn object's lifetime runs out (i.e. next N frames).

Issue calls on a per-frame basis. Never assume anything about any future Nth frame. You don’t know when the player is going to hit the button to go to the next stage and all your assets need to be unloaded or exchanged.

L. Spiro

#5286341 programming language for android.....

Posted by on 11 April 2016 - 12:41 PM

I can't decide if i will study native language like Obj-C and C++ because I'm worried that maybe this language will no longer exist in the next decade.

Actually your thinking is backwards.
C++ has possibly been around longer than you have been alive (as for me it is only 1 year younger) and will be here long after you are dead.
If you are planning on programming in 10 years, you should be focusing on C++, as it is the only language guaranteed to be here a decade from now.



L. Spiro

#5286241 programming language for android.....

Posted by on 11 April 2016 - 01:15 AM

C++, if you want the pain. C++ if used correctly will give you more performance and control over the chosen platform at the cost of more developer responsibility. You will also be more tied to hardware as you compile for target type of CPU, this can be a bad thing in the fast moving world of mobile.

C++ doesn’t tie you to hardware. C++ is more portable than Java in the fast-moving world of mobile, as it is the only language supported on Android, iOS, and Windows Phone.
Moving portable C++ code to a new platform means changing the target architecture. Only.
All languages have API-specific functions that would need to be replaced when moving to a new platform, and this is unrelated to the language itself.

Your choices should be between Java and C++ (with the NDK), but these were mostly outlined above.
Debugger support for the NDK has been traditionally poor.

L. Spiro

#5286237 Proper Component Decoupling

Posted by on 11 April 2016 - 12:50 AM

Is it appropriate to have the collision detection code in a mover component given that the player is moving in response to the collision?

Absolutely not. Throwing in the note about it being a player that is moving in response to the collision is a straw man.
#1: It’s not even a valid framing of the situation. If the player bumps a tire, the player bounces back and the tire bounces forward. Is the player moving in response to the tire, or is the tire moving in response to the player?
Collision must be handled by something that doesn’t have a frame of reference.  They bounce off each other.
#2: What about when a monster hits a wall?  Or a car hits a monster?  Or a rock hits a car into a monster into a wall?
Are you going to duplicate your collision code for every case?  Clearly collision and physics don’t belong inside an update meant for a specific type of entity.

Also, this would require the component to have knowledge about objects it could collide with and where the screen edges are.

Correct, which is another clue as to why this is not the way to go. “God” has knowledge regarding all the objects in the scene, the walls, and how they should interact. The player is one of “God’s” puppets.
The player knows about itself, and some higher-level part of the game engine knows about all objects and how they interact.

The way it should be:
#1: A high-level system reads input and applies them to the correct player. For most games this means applying a force vector on the player, but for ultra-simple games it could mean an actual change to the character’s position.
#2 Normal: For normal games that use force vectors, a high-level run over objects is done to perform collision detection and physics. This results in a new set of force vectors which have been created to keep the player out of walls. This is also where item pick-up would occur.
#2 Simple: For ultra-simple games that modify the position directly, a high-level collision-detection routine is run over all objects (this could be done in the ECS fashion) and objects are sanitized (moved out of walls) and items are gotten, bullets hit enemies, etc.
#3 Normal: For normal games that use force vectors, an ECS-style run over the objects would be implemented to update their positions based on their force vectors.
Strictly adhering to the ECS fashion of running over objects and doing some kind of update on each component will lead you to extremely poor design.
ECS itself isn’t generally the best way to go anyway, and when you do go that way you still have to think about what makes sense as a per-entity update via some updating function going over each entity in turn vs. a normal update where objects are collected via some specific method and parts of them updated there.
Even if you are using ECS you still need to handle collision/physics in the normal way of gathering the minimal set of potential colliders (usually through a spatial partitioning scheme) and handling only them.
Even if your game is as simple as Pong it doesn’t make sense to run over every entity to handle collisions. It still has to be done at a higher level where knowledge of the game rules, objects, and boundaries exists.

L. Spiro

#5285898 A list of topics you need to know to have learned the basics?

Posted by on 08 April 2016 - 02:37 PM

How to use a debugger.

L. Spiro

#5285874 Creating an infinite grid in OpenGL 2.1

Posted by on 08 April 2016 - 12:49 PM

So factor is some number, canvasWidth is some number, and coordinateFactorWidth is some number divided by some other number.
The running theme here is that you keep not providing enough information for anyone to do anything but ask for more information.
Would it be so hard to provide the equation, and then say, “factor is 10, canvasWidth is 800”?
Did you try ::glLoadIdentity() as you were told to do?  If you are calling render() before drawGrid() it is fairly likely that the matrices are set to something strange.
Did you disable depth testing?
Are the dots a different color from the clear/background color?
Are the dots too close to the camera and being culled?  Why not push the dots back a unit?
How big are the dots?  Are you sure they are at least a pixel large?
Is coordinateFactorWidth an integer?  Is it rounding down to 0?
Did you set an orthographic projection matrix?
The first thing geometryEditorCanvas::drawGrid() should be doing is setting the model and view matrices to identity and setting an orthographic projection matrix.

glColor3d(0.0d, 0.0d, 0.0d);

This is not valid code. 0.0d = compiler error. If you want to type a constant double literal, use 0.0.
This suggests very clearly that this is not your actual code.
If you want help, post your exact actual code.
L. Spiro

#5285714 Keyboard input Windows

Posted by on 07 April 2016 - 10:28 PM

It was requested that you show your message pump, because most likely you're only handling one message and then running your game loop.
You need to handle all messages that are pending and then run your game loop.

Or you need to not clear your list internally every frame.

L. Spiro

#5285289 Is it bad to use macros such as __LINE__ or __DATE__?

Posted by on 05 April 2016 - 11:19 AM

For example, you might expect __FILE__ inside a header to show the header's name. That's "intuitive", but it's not what you get. It shows the name of the translation unit.

__FILE__ prints the file name last set by a #line directive.
#line is implicitly set before and after each file as they are #include’d and once at the top of the translation unit.
Unless you override the preprocessor’s internal file name and line counter with #line, __FILE__ prints the correct name of the current file, even if it is inside headers.

__LINE__ is the presumed (whatever that means) line number in the current source file. What's __LINE__ if you use it in an included header? You tell me?

The “presumed” line is the line the preprocessor counter tracks as if goes through the file, which will be accurate in any file as long as you don’t mess with it manually via #line.

L. Spiro

#5284877 Gamestate and Intro, Main Menu, Playing state

Posted by on 03 April 2016 - 10:51 AM

Sorry, by this I mean my game::run() loop, which looks like this (it is based on the game engine of Charles Kellys programming2dgames book:

Then I suggest we put that book aside.
There does not exist a case in which it is remotely acceptable to return from the main loop without having drawn the scene.
I said the main game loop allows you to update 0 or more times and always draws once.
You may possibly enter into a 0-length loop for updating if not-enough time has passed for a logical update, but you cannot ever under any circumcisions leave the routine without a single render.
And this is without even mentioning the Sleep()’ing sin. We don’t do that. No.

My bad, I am [still doing weird things].

I wave my hand and erase from your mind the concept of ECS and now you have become a better programmer.
What you mean to say is, “I have studied how structures of arrays gave me the same performance as ECS, and how an adjustment to my quad-tree such that all the leaf-node children are sequential inside a pool of memory took care of caches misses, and have decided that learning ECS and considering myself happy with the results cheats me out of learning how to solve actual performance problems and teaches me that over-reaching solutions are always best.  I will always be an inferior programmer and lose many job opportunities due to my inability to answer simple questions during interviews based on simple cache usage and understandings of basic programming principles.”



L. Spiro

#5284049 Actor object understanding

Posted by on 29 March 2016 - 07:44 AM

An actor is the base class for all objects in your game world.

It contains a world and local transform (position, rotation, and scale), and a parent and children actors for the scene hierarchy.


The end.



L. Spiro

#5284037 Gamestate and Intro, Main Menu, Playing state

Posted by on 29 March 2016 - 06:32 AM

General Game/Engine Structure
Passing Data Between Game States

When should they get loaded / rendered?

Loaded on the next frame after being set.
Rendered every frame.

Should there be a specific system for them (MenuSystem)?

What does that mean?

Do they need to be updated in the game::update() loop, or just the game::render() loop?

These are not separate loops. There is only 1 game loop, which may update the game logic 0 or more times, and always renders one time.
Fixed-Time-Step Implementation

I am working with what will hopefully be a component based engine .

Why would you hope for an over-engineered misguided system that you can’t possibly use correctly/fully and doesn’t serve your needs best?

ECS is a buzzword and solves a specific domain of problems suitable only to marketed engines such as Unity.  It’s not for you and doesn’t help you in any way.  Stop doing things just because they are buzzwords.

L. Spiro