# jbates

Member

6

119 Neutral

• Rank
Newbie

• Interests
Programming
1. ## Announcing The Tuesday C++ Vector Math and SIMD Library Version 1.0

Hi GameDev.net!   I've been working on a math and template library on and off since January and I've finally gotten to a point I feel comfortable calling version 1.0.   https://github.com/Cincinesh/tue The Tuesday C++ Vector Math and SIMD Library is a library of template classes and math functions with a focus on physics and graphics applications. It provides data types commonly used in games and other simulations such as vectors, quaternions, and matrices, SIMD intrinsic wrapper classes completely separate from (but compatible with) the other types, operator overloads for combining and manipulating all these types, as well as some other common mathematical functions. It was written to match the style of the C++ Standard Library and uses modern C++ features (i.e., C++14) extensively. Tuesday provides the following unique features over other similar libraries such as GLM: The dimensions of vector and matrix types are template parameters, unlike GLM where, e.g., tvec2, tvec3, and tvec4 are separate types. By making the dimensions template parameters, it's possible to write one template function that can operate on and/or produce vectors or matrices of multiple dimensions. For example, the transformation matrix generation functions (translation_mat, rotation_mat, etc.) can produce matrices of multiple sizes so long as they meet the minimum requirements of each transformation and are, at the largest, 4x4. It makes heavy use of decltype in return types. This makes it possible for composite types to behave much more like their component types when it comes to things like implicit type conversions. For example, fvec3 + dvec3 results in a dvec3 just as float + double results in a double. It uses constexpr whenever possible which, as it turns out, is often. SIMD types are completely separate from vector types. This may seem counter-intuitive, but SIMD vectors aren't very efficient when used as traditional 3D vectors. The fourth component of an SIMD vector would often go to waste, and functions where multiple components interact (such as the length function, dot product, or cross product) would be horribly inefficient with SIMD intrinsics. Instead, SIMD instructions should be used to perform the same logic on multiple vectors in parallel. Tuesday is designed for this use case. For example, vec3<float32x4> v could be thought of as 4 parallel 3D vectors (4 x-values, followed by 4 y-values, and finally 4 z-values). Something like math::dot(v) would then compute a single float32x4 containing the dot products of those 4 parallel vectors without any inefficient component shuffling. See this answer to a naive question I asked on Stack Overflow a few years back for some more rationale. The SIMD system supports a huge number of types. You can create 2, 4, 8, 16, 32, and 64-component vectors of all the major arithmetic types (float, double, int8_t, int16_t, int32_t, int64_t, uint8_t, uint16_t, uint32_t, and uint64_t) along with sized boolean types (bool8, bool16, bool32 and bool64). If SIMD-intrinsic acceleration isn't available for a particular type, there's a standard C++-compliant fallback. If a vector has too many components for acceleration, but a smaller vector with the same component type can be accelerated, then the larger vector is simply the composite of two smaller vectors. For example, if float32x4 is accelerated but float32x8 isn't, then float32x8 will at least be partially-accelerated in that it's made of two float32x4's.   It's licensed under the Boost Software License so it's should be free to use in just about any project.   I've also published the testing framework I wrote for it, "Monday", separately:   https://github.com/Cincinesh/mon   Please take a look and let me know what you think!
2. ## The Tuesday C++ Math and Template Library

I'm big on uniformity when it comes to template libraries. If I'm implementing half the operators, why not just do them all? I agree they'll probably never be used, and usually less is more, but the overhead in implementing and maintaining them seems worth the effort in this case to me, especially considering their implementations and tests are almost identical to the corresponding arithmetic operators.   And thanks for the crashing test handler suggestions! I'll put it on my todo list.
3. ## The Tuesday C++ Math and Template Library

Haha, yeah, I felt pretty clever. It started with incrementing every letter in the name of the "std" namespace (s->t, t->u, d-e), and I decided to go from there. I'm planning to use the remaining days of the week only for really generic libraries, e.g., the types you'd find in the Standard Library or Boost, and I don't think I'll ever write more than 7. I don't know anything about CML but here's some differences from GLM as I remember it: Different vec sizes are specializations of the single vec<T, N> template declaration. Same thing with mat<T, C, R> and simd<T, N>. This makes it possible to write template functions that operate on any size. It makes heavy use of decltype and auto return types. This makes it possible to write vec<T, N> and mat<T, C, R> operators with implicit type promotion, e.g. fvec3 + dvec3 results in a dvec3 just as float + double results in a double. It makes heavy use of constexpr. It's not all that useful in Visual Studio yet (VS2015 is lagging behind a little in this regard), but Clang and GCC can reap the benefits. simd<T, N> types and vec<T, N> types are totally separate. It may seem counter-intuitive to some, but SIMD vectors aren't all that useful as traditional 3D vectors. SIMD instructions are more useful when operating on totally independet things at the same time. For example, vec3<float32x4> would be used to perform the same math on four logically separate fvec3's in-parallel. Also, the simd<T, N> types make it intentionally difficult to access individual components, since this is a slow operation in most SIMD instruction sets. There are probably others, but these are the big ones that come to mind immediately. I'll add something like this to my README. Thanks for asking.
4. ## The Tuesday C++ Math and Template Library

Hi GameDev.net! I've been working on a math and template library on and off since January and it's gotten to a point where I'm comfortable sharing it.   https://github.com/Cincinesh/tue   There's also some initial work on SIMD data types which exist as a conventient wrapper around not-so-friendly SIMD intrinsics (SSE bool32x4 and float32x4 only so far). It's written in such a way that you could easily have vectors or matricies of SIMD types (or any other type that has mathematical operators).   It's licensed under the Boost Software License so it's should be free to use in just about any project.   Please take a look and let me know what you think!
5. ## Efficient Data Structure for Storing Game Objects

Aha! I didn't know about pre-fetching. I figured something like it might exist, but I wanted to get my idea written down before finding out it's sub-optimal.   Question: What if I read lines 1,2,3, then 5,6,7,8, then 10,11... etc. Ordered chunks with small gaps. If I sorted my allocated id list by id number, this is essentially what my iteration would look like. Are CPUs smart enough to pre-fetch even when these small gaps exist? I'm aware that, even if a CPU did prefetch my subsequent data in-spite of gaps, it would still have to do so more often than if there weren't gaps, but it wouldn't be any worse than if all the gaps were filled (which they would have had to have been at one time, otherwise they wouldn't exist to begin with).   I already store my object data as a struct of arrays. Is that what you mean? My location/rotation pairs are in one array, world matricies in another, world-view-projection in yet another, etc. But they're all grouped by the same indicies. Are you saying they should have separate id schemes? And if so, how do you suggest associating them?   I do keep graphics objects and logic objects separate. "VisualID" is just one member of logical game objects. Do you think I should get more granular than that with my groupings?   Like I said, matrix calculations are great candidates for multi-threading and tight loops. Instead of calculating them on demand, I think it's better to calculate all the world matricies that are going to be used by a particular frame in one pass, then the world-view-projection matricies, then the world-inverse-transpose matricies. At the same time, I just realized there'd be no need to swap them if I'm re-calculating them every frame anyway.   This whole discussion is only for dynamic objects by the way. Static environment stuff can obviously be allocated and calculated once upfront.   But if I'm not accessing them in order, what's the point of keeping them contiguous?   Also, I imagine my solution will keep the data relatively compact anyway. Consider the following game flow for a traditional arena-style deathmatch game: All the players and items are allocated and set to their initial states. Whenever a player dies, they respawn soon after. Whenever an item is picked up, it respawns after some fixed amount of time after. In my solution, immediately after the initial allocation, all the allocated objects are compact at ids 0,1,2,3... up to however many there are. Whenever an object is deallocated (a player dies or an item is picked up) it leaves a gap. But since that gap goes to the top of the free id stack, it will be the first to be filled in as soon as something else is allocated.   I guess it all depends on the data allocation patterns of a particular game. If the number of allocated objects is kept in equilibrium like above, then my pool allocator sounds pretty good. If, however, a large number of things are allocated, then a non-contiguous half of them are deallocated, then my data structure is left with many wasted holes.   Also, I wonder how ephemeral objects like bullets factor into all this... I have more thinking to do.   It's always a nice feeling when you realize your own good solution is the same as a well-known good solution.
6. ## Efficient Data Structure for Storing Game Objects

For the past year or so I've been toying with game engine dev on and off and I've tried a couple different approaches to allocating and storing "game objects" (actors, entities, whatever you want to call them). For a couple months now I've favored the solution presented in this Stack Exchange answer: http://gamedev.stackexchange.com/a/33905/39265   To put it briefly, the solution described is an array where allocated objects are kept contiguous in memory for the sake of cache coherency by swapping the object highest in memory with any gaps created during deallocation. It also uses two levels of indirection to maintain stable unique ids for referencing the moving objects.   The author on Stack Exchange didn't mention this, but the data structure could easily be augmented by making it a struct of arrays using the same id allocation/dereferencing scheme.   I like this solution, but it has some flaws that always concerned me: How long does it take to swap an object during deallocation? My game objects have a huge amount of data associated with them: Location, rotation, a 4x4 world matrix, a 4x4 world-view-projection matrix, a 3x3 world-inverse-transpose matrix... those alone make up at least 47 floats, or 188 bytes. Swapping is not only an issue for deallocation. Sorting would have the same implications. What if I want to sort my objects for some reason, e.g. to reduce GL state changes during rendering. I assume a cache miss on a small game object component on the CPU is nothing compared to a cache miss on the GPU looking for an entire texture. While making my own implementation of the data structure, I realized something: x86 cache lines are usually only 64 bytes long now-adays. That's only one 4x4 float matrix. If I'm doing matrix heavy calculations (which are ripe of tight loop and multi-threading optimizations), so long as the matricies are cache-aligned, I don't have to worry about contiguousness for the sake of cache coherency. Without the need for contiguousness, there's no need to swap during deallocation. That solves my first concern. It has the added bonus of reducing the levels of indirection from two to one since the objects will never move in memory.   Also, while I'm no longer keeping the objects themselves compact in memory, I'm still maintaining a list of allocated ids that's compact. This way, all objects can be iterated over without wasting time with unallocated objects. As a plus, this list of allocated ids can simply be the other half of an unallocated id list (or "free list" as the Stack Exchange author calls it).   The list of allocated ids can also be sorted efficiently as per my second concern, since you only need to swap integers and not entire objects. Just make sure you update any "reverse ids" for efficient deallocation accordingly (this can be done with one linear pass of the sorted allocated id list). Also, if you make the id number itself part of your sorting criteria, that would theoretically improve cache coherency during iteration.   Below is a sample C++ template that implements the id management of the data structure. #include <array> #include <stdexcept> #include <utility> template<typename IdType, IdType MaxSize> class id_manager { private: typedef std::array<IdType, MaxSize> array_type; public: typedef IdType size_type; typedef typename array_type::value_type value_type; typedef typename array_type::difference_type difference_type; typedef typename array_type::const_reference const_reference; typedef typename array_type::const_pointer const_pointer; typedef typename array_type::const_iterator const_iterator; typedef typename array_type::const_reverse_iterator const_reverse_iterator; private: size_type _size; // The first _size elements in _ids are allocated ids. // The remaining elements are unallocated ids. array_type _ids; // _reverse_ids maps ids to indexes into the _ids table for efficient // deallocation. array_type _reverse_ids; public: id_manager() : _size(0) { // All entries in _ids are initially unallocated. for (IdType i = 0; i < MaxSize; ++i) { _ids[i] = i; } } IdType allocate_id() { if (_size == MaxSize) { throw std::length_error( "Attempt to allocate beyond this id_manager's maximum size"); } IdType result = _ids[_size]; _reverse_ids[result] = _size; ++_size; return result; } void deallocate_id(IdType target) { --_size; // Swap the now unallocated gap with the end of the allocated section. IdType reverse_id = _reverse_ids[target]; std::swap(_ids[reverse_id], _ids[_size]); _reverse_ids[_ids[reverse_id]] = reverse_id; } void clear() { _size = 0; // All entries in _ids are reset to unallocated. for (IdType i = 0; i < MaxSize; ++i) { _ids[i] = i; } } bool empty() const { return _size == 0; } size_type size() const { return _size; } size_type max_size() const { return MaxSize; } const_iterator begin() const { return _ids.begin(); } const_iterator end() const { return _ids.begin() + _size; } const_reverse_iterator rbegin() const { return const_reverse_iterator(end()); } const_reverse_iterator rend() const { return const_reverse_iterator(begin()); } }; The ids can be used as indices into arrays of object data with the same max_size().   I haven't actually done performance tests, but I can't think of a better solution if sorting (or simply not swapping during deallocation) is more important to you than contiguous memory.   What do you all think? Are my priorities in the right place? Any enhancements you can think of?
7. ## Implementing Component-Entity-Systems

Nice article. It explains the basics of CES implemented with a struct of arrays very well. I understand you want to keep it simple, but I think there are some basic optimizations worth mentioning in the article if not in a follow-up article. 1. Maintain a stack of "free" entity ids instead of iterating over every id until you find one with a zero mask. 2. Maintain a "highest used entity id" variable that you can use as the upper-limit of your iterations instead of ENTITY_COUNT. This way, if you have ENTITY_COUNT set to 1000 but all your in-use entities have an id of 50 or lower, you don't waste time checking the masks of entities 51 though 999.