• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
EddieV223

OpenAL why is there no group working on it?

30 posts in this topic

Why is openAL not being developed?  We need a hardware accelerated cross platform API for audio, like openGL is for graphics!  

 

I will never forgive Microsoft for removing the audio HAL from windows.

Edited by EddieV223
2

Share this post


Link to post
Share on other sites

People don't care about audio in the way they care about graphics.

the graphics library is also often a lot more critical as well.

hardware accelerated graphics: necessary to have good graphical quality and/or playable framerates.


hardware accelerated audio: neither particularly critical nor is the relevant hardware commonly available on end-user systems (IOW: doesn't work with typical onboard audio chipsets).

so, audio stuff generally ends up being done in software.
1

Share this post


Link to post
Share on other sites

hardware accelerated audio: neither particularly critical nor is the relevant hardware commonly available on end-user systems

 

But that's a circular argument. Hardware accelerated graphics weren't necessary for most of the 1990s, and we enjoyed the games then. But we realised it would be cool to have more powerful graphics. More demanding software inspires more powerful hardware, which permits even more demanding software, and so on.

 

There are several ways in which we could be making good use of hardware accelerated audio, and I listed several in this post. But until we see developers and researchers start attempt these things, and make it clear to hardware manufacturers that they want more power, then we won't see much movement.

2

Share this post


Link to post
Share on other sites

I think the main reason why there is no huge demand for audio hardware is that it's perfectly possible to do render 20-30 three-dimensional sources in realtime in software, in CD quality (and, without totally killing the CPU). The difference between 20 sources, 200 sources, and 2000 sources is very small, if audible at all. Therefore it is conceivable to get away with fewer.

Monitor speakers and headsets are often of embarrassingly low quality too, so even if the sound isn't the best possible quality, a lot of people won't notice at all (and they'll not notice the difference between the most expensive soundcard and the onchip one, either).

 

It is, on the other hand, not trivially possible to do a similar thing with 3D graphics (not at present-day resolutions, and not with state-of-the-art quality, anyway). The difference between 20, 200, and 2000 objects on screen is immediately obvious. Displays are usually quite good, so the difference between good graphics and bad graphics is immediately obvious, too.

 

That doesn't mean that OpenAL is not being developed at all, however. The OpenAL-Soft implemention, which is kind of a de-facto standard (as compared to the dinosaur reference implementation) undergoes regular updates and implements several useful self-made extensions.

Edited by samoth
2

Share this post


Link to post
Share on other sites

I'm not convinced that the number of objects was a big factor. For the first 5 years of consumer graphics card availability, pretty much every game that could use a GPU needed a software fallback. You had to be able to show the same number of objects whether you used hardware or software, just at a different degree of quality. The same would apply for sound now. (And by quality in the audio context I don't mean using 96KHz / 24bit sound, I mean simulating reverb, occlusion, etc - things you can't do very cheaply but which you can discern on even the cheapest headphones.)

0

Share this post


Link to post
Share on other sites

Oh, to be able to go back to 1998 and give Aureal better lawyers...

I just read up on that court case. Wow, just... wow.

0

Share this post


Link to post
Share on other sites

Back in the day I had an xfi extreme music, and a headset with 3 speakers in each ear for real 5.1 surround sound in a headset.  People though I cheated all the time in COD and Medal Of Honor, because I would turn and face people through walls and buildings, I could be ready for them before they turned corners.   It was really just because I could clearly hear their footsteps and gear gingling from far away.  With regular software/mobo audio this doesn't happen at all.

 

Since microsoft removed the audio HAL, hardware accelerated audio pretty much died instantly.

Edited by EddieV223
0

Share this post


Link to post
Share on other sites

The next big thing in audio has to be the real-time modelling of acoustic spaces. The extra dimension of realism this would add would be eye opening

0

Share this post


Link to post
Share on other sites
32 sources compared to hundreds of sources with proper occlusion and implicit environmental effects (I.e. they echo if they happen to be next to a stone wall, not because you explicitly told the source to use the 'stone room' effect) is an unimaginably huge difference. Audio really has been stagnating.

A lot of people have shitty PC speakers, yeah, but a lot of people also have cinema-grade speakers and/or very expensive headsets. Surround sound headsets are becoming very common with PC gamers at least.

Is it possible that in the future, instead of a dedicated audio processing card, we'll just be able to perform our audio processing on the (GP)GPU?
2

Share this post


Link to post
Share on other sites

GeneralQuery, there's certainly some interesting work happening in that area - for example the 'aural proxies' stuff here - http://gamma.cs.unc.edu/AuralProxies/ - but they are calling 5-10 ms on a single core "high performance", and I would suggest they need to do better than that for it to be widely accepted, especially since none of their examples show how the system scales up to double digit numbers of sound sources.

 

Hodgman, there was some talk of the GPU over in the other thread that I linked to above. From what I understand opinion is a bit divided as to whether the latency will be an issue. One poster there said he could get it down to 5ms of latency, but that was reading from audio capture, presumably a constant stream of data, to the GPU; going the other direction from CPU -> GPU -> PCIe audio device may not be so quick, and even just a 10ms delay will ruin the fidelity of a lot of reverb algorithms.

0

Share this post


Link to post
Share on other sites

32 sources compared to hundreds of sources with proper occlusion and implicit environmental effects (I.e. they echo if they happen to be next to a stone wall, not because you explicitly told the source to use the 'stone room' effect) is an unimaginably huge difference. Audio really has been stagnating.

A lot of people have shitty PC speakers, yeah, but a lot of people also have cinema-grade speakers and/or very expensive headsets. Surround sound headsets are becoming very common with PC gamers at least.

Is it possible that in the future, instead of a dedicated audio processing card, we'll just be able to perform our audio processing on the (GP)GPU?

I had considered this before (using GPU for some audio tasks), but haven't done much with this.


probably, a person doesn't need to realistically calculate every sample, but many effects (echoes, muffling, ...) can be handled by feeding the samples through an FIR (or IIR) filter.

the problem then is probably mostly the cost of realistically calculating and applying these filters for a given scene.

possibly, some of this could be handled by enlisting the help of the GPU, both for calculating the environmental effects, and possibly also for applying the filters (could possibly be handled using textures and a lot of special shaders, or maybe OpenCL, or similar).

I have a few ideas here, mostly involving OpenGL, but they aren't really pretty. OpenCL or similar could probably be better here...


in my case, for audio hardware, I have an onboard Realtek chipset, and mostly use headphones.
0

Share this post


Link to post
Share on other sites

Is it possible that in the future, instead of a dedicated audio processing card, we'll just be able to perform our audio processing on the (GP)GPU?

I've seen a few VSTs for real time processing (convolution reeverbs if I recall correctly) being accelerated with CUDA. I dunno how well they would work on a videogame.

 

Searched for "cuda vst" in Google and some things turn up. http://www.liquidsonics.com/software_reverberate_le.htm

0

Share this post


Link to post
Share on other sites

Is it possible that in the future, instead of a dedicated audio processing card, we'll just be able to perform our audio processing on the (GP)GPU?

I've seen a few VSTs for real time processing (convolution reeverbs if I recall correctly) being accelerated with CUDA. I dunno how well they would work on a videogame.

 

Searched for "cuda vst" in Google and some things turn up. http://www.liquidsonics.com/software_reverberate_le.htm

The latency is not such a problem for audio engineering but becomes problematic for real-time interactive applications.

0

Share this post


Link to post
Share on other sites

 

Is it possible that in the future, instead of a dedicated audio processing card, we'll just be able to perform our audio processing on the (GP)GPU?

I've seen a few VSTs for real time processing (convolution reeverbs if I recall correctly) being accelerated with CUDA. I dunno how well they would work on a videogame.

 

Searched for "cuda vst" in Google and some things turn up. http://www.liquidsonics.com/software_reverberate_le.htm

The latency is not such a problem for audio engineering but becomes problematic for real-time interactive applications.

How much is too much latency?

 

At least from what I've seen latency is a problem in audio engineering and music production, people prefer to work with DAWs with <10ms latency for maximum responsiveness (specially when dealing with MIDI controllers). 10ms is too much?

Edited by TheChubu
0

Share this post


Link to post
Share on other sites

 

 

Is it possible that in the future, instead of a dedicated audio processing card, we'll just be able to perform our audio processing on the (GP)GPU?

I've seen a few VSTs for real time processing (convolution reeverbs if I recall correctly) being accelerated with CUDA. I dunno how well they would work on a videogame.

 

Searched for "cuda vst" in Google and some things turn up. http://www.liquidsonics.com/software_reverberate_le.htm

The latency is not such a problem for audio engineering but becomes problematic for real-time interactive applications.

How much is too much latency?

 

At least from what I've seen latency is a problem in audio engineering and music production, people prefer to work with DAWs with <10ms latency for maximum responsiveness (specially when dealing with MIDI controllers). 10ms is too much?

Latency in a DAW is not a problem (I'm not talking about midi latency but the latency between what is heard), even a few hundred milliseconds is certainly liveable. The problem with real-time, interactive applications like games is that the latency between what is seen and what is heard will pose problems and ruin the illusion.

0

Share this post


Link to post
Share on other sites

10ms

I'm no expert, but considering the speed of sound (ca. 300 m/s) and the size of a head (ca. 0.3 m), the difference between "sound comes from far left" to "sound comes from far right", which is pretty much the most extreme possible, is 0.5 ms. The ear is able to pick that up without any trouble (and obviously, it's able to pick up much smaller differences, too -- we are able to hear a lot more detailled than just "left" and "right").

 

In that light, 10ms seems like... huge. I'm not convinced something that coarse can fly.

 

Of course we're talking about overall latency (on all channels) but the brain has to somehow integrate that with the visuals, too. And seeing how it's apparently doing that quite delicately at ultra-high resolution, I think it may not work out.

Edited by samoth
1

Share this post


Link to post
Share on other sites

10ms

I'm no expert, but considering the speed of sound (ca. 300 m/s) and the size of a head (ca. 0.3 m), the difference between "sound comes from far left" to "sound comes from far right", which is pretty much the most extreme possible, is 0.5 ms. The ear is able to pick that up without any trouble (and obviously, it's able to pick up much smaller differences, too -- we are able to hear a lot more detailled than just "left" and "right").

 

In that light, 10ms seems like... huge. I'm not convinced something that coarse can fly.

 

Of course we're talking about overall latency (on all channels) but the brain has to somehow integrate that with the visuals, too. And seeing how it's apparently doing that quite delicately at ultra-high resolution, I think it may not work out.

 

 

If all sounds are delayed the same, I think it might work. 10ms means it starts while the right frame is still displaying.

You usually have some delay in all soundsystems from when you tell it to start playing until it plays, but I don't know how long it usually is... Longer on mobile devices at least.

As long as it's below 100ms or so, I think most people will interpret it as "instantaneous".

 

Phase shifts and such in the same sound source reaching both ears is another thing.

 

It would be pretty easy to test...

 

Edit:

Also, to simulate sound and visual-sync properly, you should add some delay. If someone drops something 3m away, the sound should be delayed 10ms.

 

I think this is good news. This means a minimum delay of 10ms just means you can't accurately delay sounds closer then 3m, but that shouldn't be much problem, since 3m is close enough that you wouldn't really notice it in real life either.

Edited by Olof Hedman
0

Share this post


Link to post
Share on other sites

 

10ms

I'm no expert, but considering the speed of sound (ca. 300 m/s) and the size of a head (ca. 0.3 m), the difference between "sound comes from far left" to "sound comes from far right", which is pretty much the most extreme possible, is 0.5 ms. The ear is able to pick that up without any trouble (and obviously, it's able to pick up much smaller differences, too -- we are able to hear a lot more detailled than just "left" and "right").

 

In that light, 10ms seems like... huge. I'm not convinced something that coarse can fly.

 

Of course we're talking about overall latency (on all channels) but the brain has to somehow integrate that with the visuals, too. And seeing how it's apparently doing that quite delicately at ultra-high resolution, I think it may not work out.

 

 

If all sounds are delayed the same, I think it might work. 10ms means it starts while the right frame is still displaying.

You usually have some delay in all soundsystems from when you tell it to start playing until it plays, but I don't know how long it usually is... Longer on mobile devices at least.

As long as it's below 100ms or so, I think most people will interpret it as "instantaneous".

 

Phase shifts and such in the same sound source reaching both ears is another thing.

 

It would be pretty easy to test...

 

Edit: Also, to simulate sound and visual-sync properly, you should add some delay. If someone drops something 3m away, the sound should be delayed 10ms.

100ms would be a very long delay, certainly enough to affect the continuity between what is seen and what is heard. This of course would only be an issue for audio sources less than approximately 100 feet from the player.

 

As a ballpark figure, anything less than 20ms would probably be feasible. The ear has trouble distinguishing separate sources that are delayed by approximately less than 20ms from each other (the Haas Effect) so I'm extrapolating that delays less than this may not be problematic (but I have nothing solid to back this claim up).

 

You could probably test this by knocking up a virtual piano that plays a note when the mouse is clicked. Keep pushing up the delay between the click and audio trigger until the discontinuity becomes noticeable.

Edited by GeneralQuery
0

Share this post


Link to post
Share on other sites
actually, perception is fairly lax when it comes to audio/visual sync delays.
IME, much under about 100-200ms and it isn't really all that noticeable.

more so, getting much under around 50-100ms may be itself difficult, largely due to the granularity introduced by things like the game-tick and similar (which is often lower than the raw framerate, where at 60fps, the frame-time is around 17ms, but the game-tick may only be at 10 or 16Hz, or 62-100ms).

there may also be the issue of keeping the audio mixer all that precisely aligned with the sound-output from the audio hardware, so typically a tolerance is used here, with the mixer re-aligning if this drifts much outside 100ms or so (much past 100-200ms and the audio and visuals start to get noticeably out of sync).

however, we don't want to re-align too aggressively, as this will typically introduce audible defects, which are often much more obvious. for example, we may need to occasionally pad-out or skip forwards to get things back in sync, but simply jumping will typically result in an obvious "pop" (and padding things out with silence isn't much better), so it is usually necessary to blend over a skip (via interpolation), and insert some "filler" (such as previously mixed samples) for padding things out (with blending at both ends). even then, it is still often noticeable (but, at least the loud/obvious pop can be avoided).


ADD/IME: actually, another observation is that while nearest and linear interpolation (such as in trilinear filtering) often work ok for graphics, nearest and linear interpolation sound poor for audio mixing, so generally a person needs cubic interpolation for upsampling and resampling. to more effectively support arbitrary resampling, such as in Doppler shifts, a stragegy resembling mip-mapping can be used, where the sample is interpolated for each mip-level, and then interpolated between mip-levels. Edited by cr88192
0

Share this post


Link to post
Share on other sites

@cr88192: 100-200ms can be an eternity in terms of audio/visual syncing. I just ran a very crude test and with a latency of 100ms the discontinuity was jarring under certain conditions. A lot of it will be source dependent though, a lot of sounds really aren't critical in terms of syncing with a particular visual cue. A monster cry of pain for example wouldn't need to be synced to start exactly at the same time as the animation. The rate and indeterminacy is also a factor, under my crude test rapid "weapon fire" was much more forgiving than intermittent weapon fire.

 

Apologies to OP, my post isn't really on topic.

1

Share this post


Link to post
Share on other sites

Keep in mind that we already have too much visual latency in a lot of systems (up to ~100ms), which sets the benchmark for what acceptable audio latency is (you don't want to hear something hit the ground before you see it!).

 

I guess to cut down on a bit of the latency but still allow for GPGPU acceleration, we need to convince AMD/nVidia to start shipping GPUs that have audio connectors on the back, just like they currently have video connectors?

On that note, HDMI actually is an audio connector.... I wonder how the transfer of audio to the GPU for HDMI currently works?

 

 

Thinking more on GPU acceleration, and doing some really bad back-of-the-napkin math:

Let's say that we can comfortably draw a 1920*1280 screen at 30Hz, which is ~73 million pixels a second.

If we then say that an audio sample has the same cost of processing as our pixels (here's the complete simplification), then, 73728000 / 44000Hz == 1675 samples.

Realistically, I'd say that modern games do a hell of a lot more work per pixel than they require per audio sample, so mixing thousands of audio samples via the GPU should definitely be a feasible goal.

 

Audio HDR (or DRC to you audio folk) is something that's hugely under-developed in games compared to visual HDR too. We can now let artists author scenes with realistic (floating point) light values, and contrast levels of 10,000 times, and have them just work thanks to clever photographic exposure schemes.

I haven't seen too many games doing the same with their audio -- in midnight silence, you should be able to hear someone drop a pin in the next room, but on a busy street at mid-day, you'd barely be able to hear a baseball smash a window.

2

Share this post


Link to post
Share on other sites

You usually have some delay in all soundsystems from when you tell it to start playing until it plays, but I don't know how long it usually is... Longer on mobile devices at least.

 

On a PC, it's probably somewhere between 10ms and 100ms.

 

On an Android phone, it's anything up to a couple of seconds, it seems...

 

Obviously most systems will have some sort of software mixer, which has its own buffer, then the hardware has its own buffer as well (to avoid sending too many I/O interrupts to the OS), so you always have some degree of latency. (Obviously hardware accelerated audio lets you could cut out the software buffer entirely.)

 

I think 10ms of latency would be fine for most gaming applications, providing there is some visual latency as well. Certain early reflection algorithms will never sound realistic at such latencies but I don't think that's solvable on consumer hardware. But I'm worried the practical latency would be higher than 10ms.

 


more so, getting much under around 50-100ms may be itself difficult, largely due to the granularity introduced by things like the game-tick and similar (which is often lower than the raw framerate, where at 60fps, the frame-time is around 17ms, but the game-tick may only be at 10 or 16Hz, or 62-100ms).

 

That's a trivial problem to solve though. Some people run more game ticks than graphics ticks, in fact. It makes a lot of sense to run a very high granularity loop for input and trivial operations to appear responsive, and only have complex logic like AI relegated to the slow tick.

 

there may also be the issue of keeping the audio mixer all that precisely aligned with the sound-output from the audio hardware, so typically a tolerance is used here, with the mixer re-aligning if this drifts much outside 100ms or so (much past 100-200ms and the audio and visuals start to get noticeably out of sync).

 

I'm not sure what issue you're referring to here - the hardware will surely have a sample rate that it wants to work at and you just feed data into its buffer. This isn't a problem that a user application needs to solve - the driver is there to keep it steady.

0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0