Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 25 Mar 2007
Online Last Active Today, 07:09 PM

#5316160 Matrix 16 byte alignment

Posted by on Yesterday, 06:23 PM

Your problem was resolved by making your code run slower (potentially a _lot_ slower, depending on the particular CPU and how badly it deals with unaligned SSE loads/stores).


Not in my experience, _mm_loadu_ps() was only a few % slower (maybe 1 cycle at most) than _mm_load_ps() when I did the benchmarks on Intel i7, and that extra cost is not even measurable when the address is aligned. Use aligned loads whenever you can ensure alignment, but it seems like more of a microptimization. You'll save more time by thinking carefully about how to lay out data for better cache utilization so that you don't pay tens of cycles each memory access.

#5314432 Audio System

Posted by on 09 October 2016 - 12:32 PM

You're on the right track. In my audio system I have a big update( const Scene& scene, float dt ) method that copies the current state of all listener/source/objects into internal data structures. I'm also simulating sound propagation effects using path tracing, so that computation must be executed as a task on a separate thread. Once the sound propagation impulse responses are computed, I have to update the audio rendering thread with the new data. I do this by atomically swapping the IRs in a triple-buffered setup. You can use a similar strategy - copy your parameters into one end of a triple-buffer, then use an atomic operation to rotate through them. One set of parameters is the current rendering thread interpolation state, another set is the target interpolation state, and the third is where the main thread writes the next set of parameters. The key is to only rotate through the buffers once the rendering thread has finished the previous interpolation operation (requires another atomic variable to signal completion). If the main thread updates the parameters more often than that, the update is just ignored. You only need to update audio information at 10-15 frames/second anyway. Anything faster is overkill perceptually.

#5308687 How to compute % of target visibility/cover?

Posted by on 30 August 2016 - 10:32 AM

You listed my suggestion. Shoot many different rays at your target and see how many hit. One thing I would add would be to add some randomization to it. Don't shoot the same rays every frame but randomly pick new ones. You can then smooth out the jitter in the %hit signal using a low pass filter.

Exactly this. Trace rays uniformly randomly distributed within the cone that contains the bounding sphere, you can find code to generate those rays online. To get the true % visible, you need to test each ray against the object neglecting occluders, then test the ray with the occluders (but only if the first ray hit the target object). This handles the case where the object isn't close to spherical and so some of the rays will miss the target regardless of any occluders.


You can trace many fewer rays (like 10x fewer) if you smooth the resulting output over many frames using a technique like Exponential smoothing. You probably can get away with using only 10-20 rays per frame with decent results. Another thing you might add would be to change the number of rays based on the size of the cone - wider angle cones need more rays. The number of rays should be proportional to the angular area covered by the cone to keep a constant density of rays.

#5308075 Threadpool with abortable jobs and then-function

Posted by on 26 August 2016 - 12:45 PM

If you have such large tasks that the time to execute them is noticeable to the user, then you should probably just break those tasks into smaller units (e.g. <100ms to execute), so that if you do need to abort/restart a computation you can just remove the unexecuted task segments from the queue. You'll also get better parallelism by making your tasks smaller (but not too small), since the threads in the pool can be kept busy continuously.


Aborting jobs will either be intrusive (job has to continually poll to see if it should stop), or destructive (killing the thread). Neither are great options.


When you talk about desiring a responsive user interface, this can be most easily achieved by having the UI run on a separate thread at a high update rate (e.g. 60Hz), so that it never has to wait on the completion of any complex tasks. The UI thread responds to events from the user and then quickly adds jobs to the thread pool as needed. When the jobs finish they can call a completion handler that locks a mutex and updates the UI display based on the new computation, so the waiting is minimal.

#5305195 Is The "entity" Of Ecs Really Necessary?

Posted by on 10 August 2016 - 06:10 PM


What's the purpose of sharing components among entities? How do you track lifetime of the component?

I don't have a good answer for the first question. I guess I don't want to place unnecessary restrictions on how the components can be used.


Components are stored in an object called a ResourceSet that is a container of arbitrary resource/component types and is also the in-memory representation of my engine's file format. Components are loaded from disk into a ResourceSet, and can then be instantiated within an engine simulation context. The engine holds raw pointers to the components and doesn't care about the object lifetimes, components can be added directly to the engine, or a copy can be created (stored in another ResourceSet that contains only runtime instantiated data). If a component is shared among multiple entities, its reference count within the engine tracks the number of entities using the component. When the reference count goes to zero, the component is removed from the engine (but not deallocated). The object instancing system is higher-level and built on top of the engine and manages the lifetime of the ResourceSet(s).

#5304803 Is The "entity" Of Ecs Really Necessary?

Posted by on 08 August 2016 - 09:20 PM

I am putting the finishing touches on my engine and I have a question about whether or not is is useful (to the general gamedev population) to have the "entity" abstraction as part of the entity component system.


In the current version of my system, the entity is just an ID, components are stored in packed arrays, and components can be a member of more than one entity. The "engine" is just a collection of arrays for the different component types. All logic is implemented in systems that can each modify the contents of the engine in their update() method. I have a prefab system similar to Unity that allows instancing of premade collections of resources. Each prefab instance gets associated with an entity ID in the engine when it is created.


I am thinking about removing the entity concept because it would greatly simplify the bookkeeping and make the overall architecture much simpler. In that case, there wouldn't be any explicit mechanism for grouping components together - the engine would just be a big unordered collection of resources that are part of the current simulation. This seems like it would work for my use cases but maybe I am not foreseeing all common needs (since I'm mostly a low level tech developer). I can envision a scenario where a user script might need to access sibling components of the same prefab instance, is this a common need? Or should I just require the user to explicitly link components via resource references so that the script could later access them?

#5299831 What Language Is Best For Game Programming?

Posted by on 08 July 2016 - 07:01 PM

In some cases, this is unfortunately true. If you decide to take the c++ route, you may have trouble getting a game made for the Mac, as it does not have very good c++ support (it recommends objective C with Cocoa). If you choose Java, you will have the best multi-platform support, but the language is always changing constantly, and it is very slow compared to c++.


Huh? My engine is in C++ and my primary development platform is OS X. You need objective-C to interface with the Cocoa frameworks but everything else can be written in C++. There is even the so-called "Objective-C++" (.mm) that makes it easy to write C++ wrappers to the Objective-C stuff.

#5298678 .obj MESH loader

Posted by on 30 June 2016 - 10:44 PM

OBJ files are notoriously hard to parse correctly and handle all of the edge cases since the file has almost no structure and strange things like negative indices. I've been tweaking my loader for years and still encounter models sometimes that will cause problems. You need to parse the file twice, first time count the elements, allocate arrays to hold them, then parse again and convert the data to binary. Then there are objects, groups, materials, smoothing groups, etc that all are applied to the faces according to their locations in the file. I'd recommend using a 3rd-party loader before implementing your own.

#5294828 How does Runge-Kutta 4 work in games

Posted by on 03 June 2016 - 12:28 PM

I don't think that most physics engines use RK4 at all, most use semi-implicit euler for its balance of speed and stability. It's ok for simple stuff like mass/spring systems, but once you incorporate collision detection and response with RK4 there is not much benefit except in certain cases. For each substep you still need to test for collisions and respond so the performance is about 4x worse than 1st-order integration methods. The integration accuracy is better than just performing 4 1st order steps, but there are many headaches involved with using RK4 in a full physics simulation for games.

#5286404 Developing games with the Flow-based programing paradigm.

Posted by on 11 April 2016 - 07:17 PM

Most of my engine's core functionality (e.g. communicating position of an object from physics to graphics, animation bindings, sound DSP processing) is abstracted as data flows between arbitrary object pairs. In the editor, objects declare their input and output connectors and the user can drag to create connections between them. An "entity" is just a collection of objects with their connections (also encoded as small objects). Each data flow connection can be set to update either before or after a specific engine subsystem, and the engine batches the updates in the correct ordering based on the connectivity between objects. Each batch, if large enough, can then be broken down into disconnected "islands" that can be executed in parallel on a thread pool. 

In theory, nodes in the graph can perform arbitrary math/logical operations so it could be used to implement more complex logic or used as a more general-purpose multimedia processing system.


A hard bit in implementing something like this in a large system is that the number of possible type interactions is O(N^2). To avoid that you must decouple the two endpoints of a connection, such as by writing the data to an intermediate temporary storage before reading it on the other endpoint. A connection consists of two Connector subclasses with read() and write() methods, plus a Connection subclass that contains the temporary storage. Then, the number of required connector types is just O(N). My system has over 50 node types so this a big win in code size.

#5285656 Sine-based Tiled Procedural Bump

Posted by on 07 April 2016 - 04:52 PM

Your image contains negative values and when you visualize it naively they appear black. Bias it to between 0-1 and the results should be similar.

#5284899 In terms of engine technology, what ground is left to break?

Posted by on 03 April 2016 - 01:15 PM

Sound has huge room for improvement, particularly in the simulation of realistic acoustics/sound propagation/spatial sound. Current games neglect many acoustic effects, most do not even handle the effects of occlusion and only apply static zoned reverberation. It's very immersion-breaking to hear a Fallout-4 supermutant talking through a wall two stories up as if it was talking in your ear.


My work/research focuses on using real-time ray tracing on the CPU, here is a recent paper of mine from i3D2016 showing what is possible. Our current system can handle about 20 sound sources in a fully interactive dynamic scene. This tech has the potential to save a lot of artist time that would otherwise be spent tuning artificial reverb filters. If you have a 3D mesh for the scene + acoustic material properties you can hear in real time how the scene should actually sound with realistic indirect sound propagation. Plus it gives a big improvement in the overall sound quality and can introduce new gameplay that isn't possible with current tech (e.g. tracking an enemy based on their sound).


The big problem in transitioning this technology to games at the moment is that you still need most of a 4-core HT CPU to do this interactively. GPU implementation is a possibility but I doubt many game developers want to trade most of their graphics compute time for sound, plus there are issues with the amount of data that needs to be transferred back to the CPU and the latency inherent in this.

#5284625 Wondering how to build a somewhat complex UI...

Posted by on 01 April 2016 - 10:42 AM

The most common way is to have a base class (Control , Widget , Window) .. and every UI element derive from it.

The base class will have a virtual OnMouseEvent(..) and every UI element will implement it .

For the keyboard events I like to have a focus element , so lets say you click on a text box , and in the OnMouseEvent() of the text box you do something like

GUISystem -> SetFocus(this) and this will pass all the keyboard events to the element you set .


And the third important thing is the callbacks. Let's say you register a callback for "CLICK" to some button , and the button will call it in OnMouseEvent().

This is the base and you can extend from there.


This, except that I think it's better for the parent of every widget to maintain the focus for its level of the GUI hierarchy, rather than a global focus for the GUI. Then, each widget in the hierarchy just passes the key/text input events down to its child that locally has focus.


Another important concept is the delegate. In my case this is just a struct with a collection of std::function callbacks that respond to events for each type of widget. This is way better than the inheritance solutions required by many GUI toolkits.

#5284103 How do I know which parameters to give Mix_OpenAudio() ?

Posted by on 29 March 2016 - 02:35 PM

These are the audio stream format parameters. They specify the format of the sound device's output stream. I don't know any specifics about the SDL API though or how it would deal with parameters that don't match the current device settings.

  1. Sample rate - 44100 Hz is standard for CD-quality audio. Some audio cards use 48kHz and pro-level hardware goes up to 192kHz.
  2. Sample type - in this case, signed 16-bit integers. This is the output format. Most audio DSP is done with 32-bit floats these days, then converted to 16 or 24-bit integers on output.
  3. Number of channels (2 = stereo)
  4. Audio buffer size - how big of a buffer should the sound be processed in. This influences the latency of the device. A 1024 sample buffer at 44100 kHz adds a latency of at least 23ms. Too much latency can be bad, but if the buffer size is too small it can cause glitches in the audio if the CPU isn't fast enough to finish the processing in time. I would try a buffer of 512 samples.

#5280932 Reducing Compile/Link Times

Posted by on 12 March 2016 - 02:35 PM

If you have link-time code generation enabled in visual studio that can increase the link time by an order of magnitude in my experience. Disabling it (for all projects) made a big difference in my build time and didn't affect runtime performance much.