Advertisement Jump to content
  • Advertisement
  • entries
  • comments
  • views

Threading and the ANL

Sign in to follow this  


Some time back I wrote about how I had started an experimental branch of the Accidental Noise Library that uses a sort of virtual machine that executes a linear vector of instructions, rather than a pointer-linked chain of modules as the master uses. Part of the reason I wanted to do this was to support multi-threading of the noise kernels. The pointer-linked modules, with their in-built caching, isn't exactly a thread-friendly paradigm. And when you are talking large buffers to map with noise, or deeply complex functions, the time can add up.

I've been working on and off on the new branch, and I have most of the noise functions implemented. I'm having a bad case of insomnia tonight, so I decided to see if it was all worth my time. I whipped up a quick test of threading to see how it does.

The threading is a wrapper around the existing mapping stuff, with a small re-write to allow the mapping to work on an arbitrary horizontal slice of the input buffer. The public-facing interface is still the same. You pass to a function a CArray2Dd buffer sized to your desired dimensions, a CKernel holding the noise function description, and parameters indicating the mapping mode to use (no seamless, seamless in various varieties) as well as a set of ranges delineating the mapping and looping areas. This has been the basic mapping interface for a long time now.

However, I implemented 2 different versions: map2DNoThread, which executes the entire mapping on the current thread, and map2DThread which calls std::thread::hardware_concurrency to get the number of threads/cores available then creates that many threads, splitting the buffer up into buffer.height()/threads chunks, and assigning a thread to each chunk, then joining them at the end.

On my system, hardware_concurrency returns 4, so the mapping runs on 4 cores. The test case I fed to it involved mapping a 10-octave Perlin gradient fBm fractal to a buffer sized 2048x2048. I used std::chrono::high_resolution_clock to track elapsed time.

On the non-threaded version, average elapsed time over 4 different runs came out to 65.89377 seconds.
On the threaded version (using 4 threads), the average elapsed time over 4 runs came out to 17.506001 seconds.

Image output for the 2 versions is identical. So clearly, it has been worth the effort to implement this branch.

I have also made a change on the backend of the noise generation function. When I first started the library, I used the 256-entry permutation table hashing used by Perlin's reference implementation. However, this hash gives the gradient function a period of 256, which can cause obvious repeating when dealing with large areas of noise. So at one point I switched the hashing to use a FNV-1A hash, which gave me a greatly increased period. However, I noticed that this hash had a tendency to produce mapping artifacts at high frequencies, as demonstrated in this high-frequency value noise mapping:


I tinkered with adding various operations to the hash, but was never really able to obtain a result I liked. So in a recent commit, I changed the hashing once again. The hashing now uses a Long Period Hash which operates similarly to how the reference hash worked initially, but with the hash being computed from 4 different permutation tables instead of 1. Due to the ability of my library to handle 6 dimensional noise, I elected to have all versions use the same sets of tables, which are sized 241, 251, 257 and 263, for a period (per the linked article) of 4088647181 (least common multiple of the hash table sizes). That's... quite a bit larger than 256. Of course, the long-period hash adds some complexity to the hashing (though I haven't, and don't really intend to, benchmarked it against the FNV hash). However, hopefully having the threaded option can offset some of the performance penalty.

And was it worth it? A high-frequency value noise mapping using the new hash...


... shows no visible artifacts introduced by the hash, unlike the FNV mapping. So, yeah, I guess it's probably worth it.

I also spent some time thrashing around with Lua binding generators and the build system. The build system has until recently been based on cmake, with build scripts cobbled from old versions and from older iterations of the Urho3D build system. However, I hate cmake. I really, really do. I think the syntax of the build scripts is ugly, dense and obscure. For a simple library such as ANL, it's kind of pointless to try to wrestle with it. So I've been experimenting with a Premake build system. Premake is essentially a build system built on Lua, so the build scripts feel immensely familiar and comfortable for me. I'm using version 4.

I have been building the Lua framework against Lua 5.1 for a long time, but recently I've found occasion to switch to 5.2. However, I have been using tolua++ to generate the bindings, and it seems as if tolua++ is abandoned. However, tolua itself is still being updated, so I switched back to tolua (which, of course, causes a few issues especially with older scripts).

I did a brief segue into using OOLua, but I don't really want to talk about that. (Crikey, what an ugly syntax. Lots and lots of typing of all-capital macros such as OOLUA_EXPORT_MEM_FUNC_10_NON_CONST.) After a couple hours of experimentation, I purged OOLua from the project and banished the link from my bookmarks. It just can't compare to the elegance of tolua's parsing of cleaned headers to generate the glue automatically.

So, that's where the ANL is sitting. Someone informed me that they have still been using the obsolete version located at the original Sourceforge page, and was unaware of the continued development or the switch to Google Code. So I edited the original project to indicate its obsolescence, and pointed links to the Google Code and a (sadly already behind the master) git repo clone at I will try to get that sf repo back up to speed, but for now your best bet at getting the new stuff is to hit the Google Code repo at and switch to the VM branch for the goods. Be aware that, as always, the project is under heavy work by a bumbling and messy coder, so the usual restrictions apply.
Sign in to follow this  


Recommended Comments

 map2DThread which calls std::thread::hardware_concurrency to get the number of threads/cores available then creates that many threads... and assigning a thread to each chunk, then joining them at the end.

Are you actually creating and joining threads each time the function is invoked?


You are liable to lose a lot of time to the thread allocation and setup this way. It would generally be better to keep a pool of threads idling in the background, and hand jobs to them as needed...

Share this comment

Link to comment

Yeah, right now I'm just allocating the threads each time. The usage case of my framework tools typically tends to be one-off tasks where I invoke the framework to execute a script and terminate. For example, creating the displacement, diffuse, and normal maps for a given baked object. But I'll definitely take your pooling suggestion when it comes time to fold this into Goblinson Crusoe, given that there it will be part of the game to generate maps, rather than offline single-execution tasks.

Share this comment

Link to comment
Apparently, Google Code is being shut down, so I guess I can expect to either treat the SF project as the main or migrate it to Github. (Probably, I'll migrate to Github.)

Share this comment

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Advertisement

Important Information

By using, you agree to our community Guidelines, Terms of Use, and Privacy Policy. is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!