• Advertisement
Sign in to follow this  

OpenAL why is there no group working on it?

This topic is 1767 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Why is openAL not being developed?  We need a hardware accelerated cross platform API for audio, like openGL is for graphics!  

 

I will never forgive Microsoft for removing the audio HAL from windows.

Edited by EddieV223

Share this post


Link to post
Share on other sites
Advertisement

People don't care about audio in the way they care about graphics.

the graphics library is also often a lot more critical as well.

hardware accelerated graphics: necessary to have good graphical quality and/or playable framerates.


hardware accelerated audio: neither particularly critical nor is the relevant hardware commonly available on end-user systems (IOW: doesn't work with typical onboard audio chipsets).

so, audio stuff generally ends up being done in software.

Share this post


Link to post
Share on other sites

hardware accelerated audio: neither particularly critical nor is the relevant hardware commonly available on end-user systems

 

But that's a circular argument. Hardware accelerated graphics weren't necessary for most of the 1990s, and we enjoyed the games then. But we realised it would be cool to have more powerful graphics. More demanding software inspires more powerful hardware, which permits even more demanding software, and so on.

 

There are several ways in which we could be making good use of hardware accelerated audio, and I listed several in this post. But until we see developers and researchers start attempt these things, and make it clear to hardware manufacturers that they want more power, then we won't see much movement.

Share this post


Link to post
Share on other sites

I think the main reason why there is no huge demand for audio hardware is that it's perfectly possible to do render 20-30 three-dimensional sources in realtime in software, in CD quality (and, without totally killing the CPU). The difference between 20 sources, 200 sources, and 2000 sources is very small, if audible at all. Therefore it is conceivable to get away with fewer.

Monitor speakers and headsets are often of embarrassingly low quality too, so even if the sound isn't the best possible quality, a lot of people won't notice at all (and they'll not notice the difference between the most expensive soundcard and the onchip one, either).

 

It is, on the other hand, not trivially possible to do a similar thing with 3D graphics (not at present-day resolutions, and not with state-of-the-art quality, anyway). The difference between 20, 200, and 2000 objects on screen is immediately obvious. Displays are usually quite good, so the difference between good graphics and bad graphics is immediately obvious, too.

 

That doesn't mean that OpenAL is not being developed at all, however. The OpenAL-Soft implemention, which is kind of a de-facto standard (as compared to the dinosaur reference implementation) undergoes regular updates and implements several useful self-made extensions.

Edited by samoth

Share this post


Link to post
Share on other sites

I'm not convinced that the number of objects was a big factor. For the first 5 years of consumer graphics card availability, pretty much every game that could use a GPU needed a software fallback. You had to be able to show the same number of objects whether you used hardware or software, just at a different degree of quality. The same would apply for sound now. (And by quality in the audio context I don't mean using 96KHz / 24bit sound, I mean simulating reverb, occlusion, etc - things you can't do very cheaply but which you can discern on even the cheapest headphones.)

Share this post


Link to post
Share on other sites

Oh, to be able to go back to 1998 and give Aureal better lawyers...

I just read up on that court case. Wow, just... wow.

Share this post


Link to post
Share on other sites

Back in the day I had an xfi extreme music, and a headset with 3 speakers in each ear for real 5.1 surround sound in a headset.  People though I cheated all the time in COD and Medal Of Honor, because I would turn and face people through walls and buildings, I could be ready for them before they turned corners.   It was really just because I could clearly hear their footsteps and gear gingling from far away.  With regular software/mobo audio this doesn't happen at all.

 

Since microsoft removed the audio HAL, hardware accelerated audio pretty much died instantly.

Edited by EddieV223

Share this post


Link to post
Share on other sites

The next big thing in audio has to be the real-time modelling of acoustic spaces. The extra dimension of realism this would add would be eye opening

Share this post


Link to post
Share on other sites
32 sources compared to hundreds of sources with proper occlusion and implicit environmental effects (I.e. they echo if they happen to be next to a stone wall, not because you explicitly told the source to use the 'stone room' effect) is an unimaginably huge difference. Audio really has been stagnating.

A lot of people have shitty PC speakers, yeah, but a lot of people also have cinema-grade speakers and/or very expensive headsets. Surround sound headsets are becoming very common with PC gamers at least.

Is it possible that in the future, instead of a dedicated audio processing card, we'll just be able to perform our audio processing on the (GP)GPU?

Share this post


Link to post
Share on other sites

GeneralQuery, there's certainly some interesting work happening in that area - for example the 'aural proxies' stuff here - http://gamma.cs.unc.edu/AuralProxies/ - but they are calling 5-10 ms on a single core "high performance", and I would suggest they need to do better than that for it to be widely accepted, especially since none of their examples show how the system scales up to double digit numbers of sound sources.

 

Hodgman, there was some talk of the GPU over in the other thread that I linked to above. From what I understand opinion is a bit divided as to whether the latency will be an issue. One poster there said he could get it down to 5ms of latency, but that was reading from audio capture, presumably a constant stream of data, to the GPU; going the other direction from CPU -> GPU -> PCIe audio device may not be so quick, and even just a 10ms delay will ruin the fidelity of a lot of reverb algorithms.

Share this post


Link to post
Share on other sites

32 sources compared to hundreds of sources with proper occlusion and implicit environmental effects (I.e. they echo if they happen to be next to a stone wall, not because you explicitly told the source to use the 'stone room' effect) is an unimaginably huge difference. Audio really has been stagnating.

A lot of people have shitty PC speakers, yeah, but a lot of people also have cinema-grade speakers and/or very expensive headsets. Surround sound headsets are becoming very common with PC gamers at least.

Is it possible that in the future, instead of a dedicated audio processing card, we'll just be able to perform our audio processing on the (GP)GPU?

I had considered this before (using GPU for some audio tasks), but haven't done much with this.


probably, a person doesn't need to realistically calculate every sample, but many effects (echoes, muffling, ...) can be handled by feeding the samples through an FIR (or IIR) filter.

the problem then is probably mostly the cost of realistically calculating and applying these filters for a given scene.

possibly, some of this could be handled by enlisting the help of the GPU, both for calculating the environmental effects, and possibly also for applying the filters (could possibly be handled using textures and a lot of special shaders, or maybe OpenCL, or similar).

I have a few ideas here, mostly involving OpenGL, but they aren't really pretty. OpenCL or similar could probably be better here...


in my case, for audio hardware, I have an onboard Realtek chipset, and mostly use headphones.

Share this post


Link to post
Share on other sites

Is it possible that in the future, instead of a dedicated audio processing card, we'll just be able to perform our audio processing on the (GP)GPU?

I've seen a few VSTs for real time processing (convolution reeverbs if I recall correctly) being accelerated with CUDA. I dunno how well they would work on a videogame.

 

Searched for "cuda vst" in Google and some things turn up. http://www.liquidsonics.com/software_reverberate_le.htm

Share this post


Link to post
Share on other sites

Is it possible that in the future, instead of a dedicated audio processing card, we'll just be able to perform our audio processing on the (GP)GPU?

I've seen a few VSTs for real time processing (convolution reeverbs if I recall correctly) being accelerated with CUDA. I dunno how well they would work on a videogame.

 

Searched for "cuda vst" in Google and some things turn up. http://www.liquidsonics.com/software_reverberate_le.htm

The latency is not such a problem for audio engineering but becomes problematic for real-time interactive applications.

Share this post


Link to post
Share on other sites

 

Is it possible that in the future, instead of a dedicated audio processing card, we'll just be able to perform our audio processing on the (GP)GPU?

I've seen a few VSTs for real time processing (convolution reeverbs if I recall correctly) being accelerated with CUDA. I dunno how well they would work on a videogame.

 

Searched for "cuda vst" in Google and some things turn up. http://www.liquidsonics.com/software_reverberate_le.htm

The latency is not such a problem for audio engineering but becomes problematic for real-time interactive applications.

How much is too much latency?

 

At least from what I've seen latency is a problem in audio engineering and music production, people prefer to work with DAWs with <10ms latency for maximum responsiveness (specially when dealing with MIDI controllers). 10ms is too much?

Edited by TheChubu

Share this post


Link to post
Share on other sites

 

 

Is it possible that in the future, instead of a dedicated audio processing card, we'll just be able to perform our audio processing on the (GP)GPU?

I've seen a few VSTs for real time processing (convolution reeverbs if I recall correctly) being accelerated with CUDA. I dunno how well they would work on a videogame.

 

Searched for "cuda vst" in Google and some things turn up. http://www.liquidsonics.com/software_reverberate_le.htm

The latency is not such a problem for audio engineering but becomes problematic for real-time interactive applications.

How much is too much latency?

 

At least from what I've seen latency is a problem in audio engineering and music production, people prefer to work with DAWs with <10ms latency for maximum responsiveness (specially when dealing with MIDI controllers). 10ms is too much?

Latency in a DAW is not a problem (I'm not talking about midi latency but the latency between what is heard), even a few hundred milliseconds is certainly liveable. The problem with real-time, interactive applications like games is that the latency between what is seen and what is heard will pose problems and ruin the illusion.

Share this post


Link to post
Share on other sites

10ms

I'm no expert, but considering the speed of sound (ca. 300 m/s) and the size of a head (ca. 0.3 m), the difference between "sound comes from far left" to "sound comes from far right", which is pretty much the most extreme possible, is 0.5 ms. The ear is able to pick that up without any trouble (and obviously, it's able to pick up much smaller differences, too -- we are able to hear a lot more detailled than just "left" and "right").

 

In that light, 10ms seems like... huge. I'm not convinced something that coarse can fly.

 

Of course we're talking about overall latency (on all channels) but the brain has to somehow integrate that with the visuals, too. And seeing how it's apparently doing that quite delicately at ultra-high resolution, I think it may not work out.

Edited by samoth

Share this post


Link to post
Share on other sites

10ms

I'm no expert, but considering the speed of sound (ca. 300 m/s) and the size of a head (ca. 0.3 m), the difference between "sound comes from far left" to "sound comes from far right", which is pretty much the most extreme possible, is 0.5 ms. The ear is able to pick that up without any trouble (and obviously, it's able to pick up much smaller differences, too -- we are able to hear a lot more detailled than just "left" and "right").

 

In that light, 10ms seems like... huge. I'm not convinced something that coarse can fly.

 

Of course we're talking about overall latency (on all channels) but the brain has to somehow integrate that with the visuals, too. And seeing how it's apparently doing that quite delicately at ultra-high resolution, I think it may not work out.

 

 

If all sounds are delayed the same, I think it might work. 10ms means it starts while the right frame is still displaying.

You usually have some delay in all soundsystems from when you tell it to start playing until it plays, but I don't know how long it usually is... Longer on mobile devices at least.

As long as it's below 100ms or so, I think most people will interpret it as "instantaneous".

 

Phase shifts and such in the same sound source reaching both ears is another thing.

 

It would be pretty easy to test...

 

Edit:

Also, to simulate sound and visual-sync properly, you should add some delay. If someone drops something 3m away, the sound should be delayed 10ms.

 

I think this is good news. This means a minimum delay of 10ms just means you can't accurately delay sounds closer then 3m, but that shouldn't be much problem, since 3m is close enough that you wouldn't really notice it in real life either.

Edited by Olof Hedman

Share this post


Link to post
Share on other sites

 

10ms

I'm no expert, but considering the speed of sound (ca. 300 m/s) and the size of a head (ca. 0.3 m), the difference between "sound comes from far left" to "sound comes from far right", which is pretty much the most extreme possible, is 0.5 ms. The ear is able to pick that up without any trouble (and obviously, it's able to pick up much smaller differences, too -- we are able to hear a lot more detailled than just "left" and "right").

 

In that light, 10ms seems like... huge. I'm not convinced something that coarse can fly.

 

Of course we're talking about overall latency (on all channels) but the brain has to somehow integrate that with the visuals, too. And seeing how it's apparently doing that quite delicately at ultra-high resolution, I think it may not work out.

 

 

If all sounds are delayed the same, I think it might work. 10ms means it starts while the right frame is still displaying.

You usually have some delay in all soundsystems from when you tell it to start playing until it plays, but I don't know how long it usually is... Longer on mobile devices at least.

As long as it's below 100ms or so, I think most people will interpret it as "instantaneous".

 

Phase shifts and such in the same sound source reaching both ears is another thing.

 

It would be pretty easy to test...

 

Edit: Also, to simulate sound and visual-sync properly, you should add some delay. If someone drops something 3m away, the sound should be delayed 10ms.

100ms would be a very long delay, certainly enough to affect the continuity between what is seen and what is heard. This of course would only be an issue for audio sources less than approximately 100 feet from the player.

 

As a ballpark figure, anything less than 20ms would probably be feasible. The ear has trouble distinguishing separate sources that are delayed by approximately less than 20ms from each other (the Haas Effect) so I'm extrapolating that delays less than this may not be problematic (but I have nothing solid to back this claim up).

 

You could probably test this by knocking up a virtual piano that plays a note when the mouse is clicked. Keep pushing up the delay between the click and audio trigger until the discontinuity becomes noticeable.

Edited by GeneralQuery

Share this post


Link to post
Share on other sites
actually, perception is fairly lax when it comes to audio/visual sync delays.
IME, much under about 100-200ms and it isn't really all that noticeable.

more so, getting much under around 50-100ms may be itself difficult, largely due to the granularity introduced by things like the game-tick and similar (which is often lower than the raw framerate, where at 60fps, the frame-time is around 17ms, but the game-tick may only be at 10 or 16Hz, or 62-100ms).

there may also be the issue of keeping the audio mixer all that precisely aligned with the sound-output from the audio hardware, so typically a tolerance is used here, with the mixer re-aligning if this drifts much outside 100ms or so (much past 100-200ms and the audio and visuals start to get noticeably out of sync).

however, we don't want to re-align too aggressively, as this will typically introduce audible defects, which are often much more obvious. for example, we may need to occasionally pad-out or skip forwards to get things back in sync, but simply jumping will typically result in an obvious "pop" (and padding things out with silence isn't much better), so it is usually necessary to blend over a skip (via interpolation), and insert some "filler" (such as previously mixed samples) for padding things out (with blending at both ends). even then, it is still often noticeable (but, at least the loud/obvious pop can be avoided).


ADD/IME: actually, another observation is that while nearest and linear interpolation (such as in trilinear filtering) often work ok for graphics, nearest and linear interpolation sound poor for audio mixing, so generally a person needs cubic interpolation for upsampling and resampling. to more effectively support arbitrary resampling, such as in Doppler shifts, a stragegy resembling mip-mapping can be used, where the sample is interpolated for each mip-level, and then interpolated between mip-levels. Edited by cr88192

Share this post


Link to post
Share on other sites

@cr88192: 100-200ms can be an eternity in terms of audio/visual syncing. I just ran a very crude test and with a latency of 100ms the discontinuity was jarring under certain conditions. A lot of it will be source dependent though, a lot of sounds really aren't critical in terms of syncing with a particular visual cue. A monster cry of pain for example wouldn't need to be synced to start exactly at the same time as the animation. The rate and indeterminacy is also a factor, under my crude test rapid "weapon fire" was much more forgiving than intermittent weapon fire.

 

Apologies to OP, my post isn't really on topic.

Share this post


Link to post
Share on other sites

Keep in mind that we already have too much visual latency in a lot of systems (up to ~100ms), which sets the benchmark for what acceptable audio latency is (you don't want to hear something hit the ground before you see it!).

 

I guess to cut down on a bit of the latency but still allow for GPGPU acceleration, we need to convince AMD/nVidia to start shipping GPUs that have audio connectors on the back, just like they currently have video connectors?

On that note, HDMI actually is an audio connector.... I wonder how the transfer of audio to the GPU for HDMI currently works?

 

 

Thinking more on GPU acceleration, and doing some really bad back-of-the-napkin math:

Let's say that we can comfortably draw a 1920*1280 screen at 30Hz, which is ~73 million pixels a second.

If we then say that an audio sample has the same cost of processing as our pixels (here's the complete simplification), then, 73728000 / 44000Hz == 1675 samples.

Realistically, I'd say that modern games do a hell of a lot more work per pixel than they require per audio sample, so mixing thousands of audio samples via the GPU should definitely be a feasible goal.

 

Audio HDR (or DRC to you audio folk) is something that's hugely under-developed in games compared to visual HDR too. We can now let artists author scenes with realistic (floating point) light values, and contrast levels of 10,000 times, and have them just work thanks to clever photographic exposure schemes.

I haven't seen too many games doing the same with their audio -- in midnight silence, you should be able to hear someone drop a pin in the next room, but on a busy street at mid-day, you'd barely be able to hear a baseball smash a window.

Share this post


Link to post
Share on other sites

You usually have some delay in all soundsystems from when you tell it to start playing until it plays, but I don't know how long it usually is... Longer on mobile devices at least.

 

On a PC, it's probably somewhere between 10ms and 100ms.

 

On an Android phone, it's anything up to a couple of seconds, it seems...

 

Obviously most systems will have some sort of software mixer, which has its own buffer, then the hardware has its own buffer as well (to avoid sending too many I/O interrupts to the OS), so you always have some degree of latency. (Obviously hardware accelerated audio lets you could cut out the software buffer entirely.)

 

I think 10ms of latency would be fine for most gaming applications, providing there is some visual latency as well. Certain early reflection algorithms will never sound realistic at such latencies but I don't think that's solvable on consumer hardware. But I'm worried the practical latency would be higher than 10ms.

 


more so, getting much under around 50-100ms may be itself difficult, largely due to the granularity introduced by things like the game-tick and similar (which is often lower than the raw framerate, where at 60fps, the frame-time is around 17ms, but the game-tick may only be at 10 or 16Hz, or 62-100ms).

 

That's a trivial problem to solve though. Some people run more game ticks than graphics ticks, in fact. It makes a lot of sense to run a very high granularity loop for input and trivial operations to appear responsive, and only have complex logic like AI relegated to the slow tick.

 

there may also be the issue of keeping the audio mixer all that precisely aligned with the sound-output from the audio hardware, so typically a tolerance is used here, with the mixer re-aligning if this drifts much outside 100ms or so (much past 100-200ms and the audio and visuals start to get noticeably out of sync).

 

I'm not sure what issue you're referring to here - the hardware will surely have a sample rate that it wants to work at and you just feed data into its buffer. This isn't a problem that a user application needs to solve - the driver is there to keep it steady.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement