Chunk relative vector arithmetic

Started by
8 comments, last by pondwater 1 year ago

I have a game that involves procedural generation and potentially very large worlds.

I figured it would be a good idea to deal with floating point precision issues as numbers get larger and larger as the player moves away from the origin.

Since my game is multiplayer I opted to create a ChunkRelativePosition type which stores the chunk coordinates as integers, and then the position relative to the chunk as floats.

struct ChunkRelativePosition {
    int32t chunkX = 0;
    int32t chunkY = 0;
    float32t x = 0;
    float32t y = 0;
};

I've got most arithmetic operations working, however they (seem to be) really inefficient. My impl fo multiplication / division of scalar values results in precision loss. Here's two examples of the addition and multiplication operators:

float32_t chunkmod(float32_t v)
{
    return (v < 0) 
        ? CHUNK_CELL_COUNT + std::fmod(v, CHUNK_CELL_COUNT) 
        : std::fmod(v, CHUNK_CELL_COUNT);
}

int16_t chunkdiv(float32_t v)
{
    return std::floor(v * INV_CHUNK_CELL_COUNT);
}

ChunkRelativePosition ChunkRelativePosition::operator+(const glm::vec2& v) const
{
    float32t dx = x + chunkmod(v.x);
    float32t dy = y + chunkmod(v.y);

    ChunkRelativePosition res;
    res.chunkX = chunkX + chunkdiv(v.x) + chunkdiv(dx);
    res.chunkY = chunkY + chunkdiv(v.y) + chunkdiv(dy);
    res.x = chunkmod(dx);
    res.y = chunkmod(dy);
    return res;
}

ChunkRelativePosition ChunkRelativePosition::operator*(float32t v) const
{
    float32t cdx = chunkX * v;
    float32t cdy = chunkY * v;

    float32t cdxi, cdxd, cdyi, cdyd;

    cdxd = std::modf(cdx, &cdxi);
    cdyd = std::modf(cdy, &cdyi);

    float32t dx = (x * v) + (cdxd * CHUNK_CELL_COUNT);
    float32t dy = (y * v) + (cdyd * CHUNK_CELL_COUNT);

    MapPosition res;
    res.chunkX = cdxi + chunkdiv(dx);
    res.chunkY = cdyi + chunkdiv(dy);
    res.x = chunkmod(dx);
    res.y = chunkmod(dy);
    return res;
}

I figure this sort of “chunk relative” positioning is not novel at all and there must be a simpler, more performant, and more precise way to do arithmetic.

Any tips or references to information about this would be greatly appreciated.

Advertisement

I spent a lot of time (a few years) figuring out the best way to do solar system+ scale worlds (blog post) and the best overall method I came up with was to use floats for everything, where each object has a pointer to a CoordinateFrame object, which itself has a double position offset relative to its parent CoordinateFrame. This forms a hierarchy of coordinate systems, which would support a world of any size (even beyond what double would support).

class CoordinateFrame
{
	Vector3<double> position;
	// I also have rotation here but it's optional and adds a lot of complexity
	CoordinateFrame* parent; // NULL if at root level
	std::vector<CoordinateFrame*> children;
	CoordinateFrameGenerator* childGenerator;
};
class SceneObject
{
	Transform3<float> transform;
	CoordinateFrame* frame; // NULL if at root level
	// Components, child objects, etc
};

This way you can do most of your calculations in single precision, except for the case when objects are in different frames. Each coordinate frame is large enough (about 1.6km) to handle the majority of objects in a typical scene. Each frame that an object moves, I check to see if it is still within the valid distance from the origin of its CoordinateFrame, and if not, then I do a slower traversal of the frames to find the CoordinateFrame that the object is closest to.

I also have a concept of a CoordinateFrameGenerator, which allows a CoordinateFrame to generate child CoordinateFrames within it, for objects that move outside the bounds of existing frames. For instance, a solar system has a coordinate frame at the center, which has a frame generator that places new frames on a grid as objects move outside of existing frames. Similarly, a planet has its own frame and generator that places frames on the planet's surface every 1.6km. So, I have a solar system frame containing a planet frame, which contains surface frames. It's frames all the way down.

To do something like render an object relative to a camera, I consider the frame of the object and camera. If they are the same, then you do rendering like normal. If they are different, then I compute a (single precision) unit vector which points from the camera to the object's position, as well as double precision distance along that vector, by walking the frame hierarchy and doing the fewest transforms necessary to preserve precision along the way. Using this unit vector and distance, it is possible to render objects at any distance by scaling the far away objects down around the camera.

Physics are harder, if you want to have interactions (collisions, gravity) between frames. I have a custom physics engine where I integrated this CoordinateFrame concept on a deep level. This keeps things efficient in the common case, and things only get slower when interacting objects are far apart (a rare thing), or at the frame boundaries. I see about a 2x slowdown in an N-body gravity simulation with CoordinateFrames vs. no frames. Not bad at all for a rare case.

This is working well for me, but I'll admit its not simple to implement because it requires deep integration into all of the core engine systems.

Doubles:

  • - Only works with sufficient precision (~0.1mm) out to Neptune's orbit.
  • - 8 bytes vs. 4 bytes, which leads to slower performance due to more cache misses
  • - Not supported (efficiently) by GPU, still need to convert to camera-relative floats
  • - Not supported by most physics engines
  • - SIMD support is not great (requires AVX, which doesn't exist on ARM macs)
  • + Simple arithmetic

Int + Float:

  • + Fast within a chunk, can use existing physics within chunk
  • + can do physics in each chunk separately
  • + can do SIMD easily on all platforms
  • - boundaries are hard (e.g. how to do physics/render efficiently across chunks).
  • - Slow arithmetic between chunks
  • - Limited to 32 or 64 bit chunk indices, which is not quite enough for a large galaxy (32 bit) or universe (64).

CoordinateFrame + Float:

  • + Can handle any size world by just adding more hierarchy levels
  • + Fast performance within a chunk (same as a small-scale game engine)
  • + Acceptable performance across chunks
  • + can do SIMD easily on all platforms
  • - Harder to implement, especially physics.
  • - Difficult to spawn objects at absolute positions, because everything is defined relative to a frame generated at runtime. I'm struggling with how to do this properly.

I also looked into other stuff like 80-bit, 128-bit floats, or float-float arithmetic and discarded these because they are slow and/or not supported on many platforms/compilers.

Thank you for the very in depth response. I checked out your link, that is a really cool project.

The requirements of my game are much simpler. It's just a procedural dungeon crawler so the int + float solution seems ideal.

My question is less about what approach to use and more about how to efficiency implement the arithmetic of these coordinates across chunks while preserving precision.

For example my current implementation for a addition worst case is 6 additions, 4 comparisons, 4 divisions and 4 fmods. There has to be a better way.

To add two ChunkRelativePosition, I would do something like this:

ChunkRelativePosition add( const ChunkRelativePosition& a, const ChunkRelativePosition& b )
{
	ChunkRelativePosition out;
	out.chunkX = a.chunkX + b.chunkX;
	out.chunkY = a.chunkY + b.chunkY;
	out.x = a.x + b.x;
	out.y = a.y + b.y;
	if ( out.x > CHUNK_SIZE )
	{
		out.x -= CHUNK_SIZE;
		out.chunkX++;
	}
	else if ( out.x < -CHUNK_SIZE )
	{
		out.x += CHUNK_SIZE;
		out.chunkX--;
	}
	
	if ( out.y > CHUNK_SIZE )
	{
		out.y -= CHUNK_SIZE;
		out.chunkY++;
	}
	else if ( out.y < -CHUNK_SIZE )
	{
		out.y += CHUNK_SIZE;
		out.chunkY--;
	}
	return out;
}

This will work OK as long as the numbers are not too large and CHUNK_SIZE is defined to a reasonable value of about 1.6km for 0.1mm precision. I still wouldn't want to do any complicated math or physics in this representation though, think about what would happen if you try to do some rotations.

Honestly, for your case it's probably going to be faster to use simple doubles for coordinates, and keep the LOD chunking system separate from coordinate system representation. If I was aiming for anything smaller than Neptune's orbit, I would probably choose doubles for the simplicity.

I haven't thought about this type of situation for about 20 years but, what would be wrong with using 64-bit integers for positions? You get really fast arithmetic and uniformly high resolution through the entire level. If your unit of length is the micrometer, 64 bits allows you to represent 123.3 AU.

I also make large scale worlds. I just use double. On the CPU that solves most your problems, unless you are doing something really odd. You really only need accurately enough to support contiguous land. Double will give you plenty of accurately even for planets far larger than earth. Now if you want planetary obits down to the millimeter, then somewhere around in outer rim of our solar system, you will lose that level of actually but only for the orbit. The planet itself should have its own coordinates system so its terrain will fine. Also, Pluto orbits at over 4km a second. players aren't going to notice if you are off some. You can go all the way up the galaxy level. You may have to snap solar system to some fraction of a lightyear but still, who really cares.

You can also use floats with some chunk offset as you are attempting. But you have to be very careful. If one calculation ends up as a float you will lose all the precision that you were trying to keep. Also, in my case chunks can resize and even one chunk can end up being larger than the precision of a float if you are viewing from above the surface of the planet. Finally, using double means your physics calculations are in single planetary coordinate system and you don't need to deal with meshing those up. I always put the origin at the center of my worlds.

On the GPU it's a different story. For all practical purposes you only have float to play with no matter what you did on the CPU side. You basically need to move the world around the camera so that everything near the camera is in high precision. This means you never want to go to world coordinates on the GPU if you are using plain Jane floats to represent coordinates. I pass down a float with an offset and I remove the offset in the transformation matrix. Note, transformation never goes to world coordinates. You want to go straight to view or projection coordinates. This way everything close to the camera keeps its precision, and you only lose precision far from the camera, where you don't care about it. I do this by giving each chunk its own transformation matrix.

After reading your posts and thinking about it more an int + float system is absolutely overkill for my use case.

I will use doubles instead. Thank you all for bringing me back to reality and saving me a headache. ?

Like alvaro said, I would suggest not using doubles, and instead using two int32s (one specifies the chunk coord and the other specifies the position within a chunk).

This way you have uniform precision everywhere, no possibility of float issues, and 32 bits of precision within a chunk is more than you'd get with your int+float system (since floats only give you 23 bits of precision within the chunk).

See this blog post (by a very experienced and wise programmer) to see why using floats or even doubles for this sort of thing is likely to be a mistake: http://tomforsyth1000.github.io/blog.wiki.html#%5B%5BA%20matter%20of%20precision%5D%5D

(note: you still want to use floats for velocities or deltas between two positions, you only need ints for the positions themselves)

Great article, thank you for posting that.

Yeah, shortly after posted I decided to to use a 32-32 fixed point position after swapping to doubles resulted in a bunch of my unit tests failing.

The range of a 64bit fixed point is more than enough to cover my use case. Thank you all for the input.

This topic is closed to new replies.

Advertisement