Jump to content

  • Log In with Google      Sign In   
  • Create Account

OpenAL why is there no group working on it?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
33 replies to this topic

#21 GeneralQuery   Crossbones+   -  Reputation: 1263

Like
0Likes
Like

Posted 20 March 2013 - 11:38 AM

 

10ms

I'm no expert, but considering the speed of sound (ca. 300 m/s) and the size of a head (ca. 0.3 m), the difference between "sound comes from far left" to "sound comes from far right", which is pretty much the most extreme possible, is 0.5 ms. The ear is able to pick that up without any trouble (and obviously, it's able to pick up much smaller differences, too -- we are able to hear a lot more detailled than just "left" and "right").

 

In that light, 10ms seems like... huge. I'm not convinced something that coarse can fly.

 

Of course we're talking about overall latency (on all channels) but the brain has to somehow integrate that with the visuals, too. And seeing how it's apparently doing that quite delicately at ultra-high resolution, I think it may not work out.

 

 

If all sounds are delayed the same, I think it might work. 10ms means it starts while the right frame is still displaying.

You usually have some delay in all soundsystems from when you tell it to start playing until it plays, but I don't know how long it usually is... Longer on mobile devices at least.

As long as it's below 100ms or so, I think most people will interpret it as "instantaneous".

 

Phase shifts and such in the same sound source reaching both ears is another thing.

 

It would be pretty easy to test...

 

Edit: Also, to simulate sound and visual-sync properly, you should add some delay. If someone drops something 3m away, the sound should be delayed 10ms.

100ms would be a very long delay, certainly enough to affect the continuity between what is seen and what is heard. This of course would only be an issue for audio sources less than approximately 100 feet from the player.

 

As a ballpark figure, anything less than 20ms would probably be feasible. The ear has trouble distinguishing separate sources that are delayed by approximately less than 20ms from each other (the Haas Effect) so I'm extrapolating that delays less than this may not be problematic (but I have nothing solid to back this claim up).

 

You could probably test this by knocking up a virtual piano that plays a note when the mouse is clicked. Keep pushing up the delay between the click and audio trigger until the discontinuity becomes noticeable.


Edited by GeneralQuery, 20 March 2013 - 11:40 AM.


Sponsor:

#22 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 20 March 2013 - 05:00 PM

actually, perception is fairly lax when it comes to audio/visual sync delays.
IME, much under about 100-200ms and it isn't really all that noticeable.

more so, getting much under around 50-100ms may be itself difficult, largely due to the granularity introduced by things like the game-tick and similar (which is often lower than the raw framerate, where at 60fps, the frame-time is around 17ms, but the game-tick may only be at 10 or 16Hz, or 62-100ms).

there may also be the issue of keeping the audio mixer all that precisely aligned with the sound-output from the audio hardware, so typically a tolerance is used here, with the mixer re-aligning if this drifts much outside 100ms or so (much past 100-200ms and the audio and visuals start to get noticeably out of sync).

however, we don't want to re-align too aggressively, as this will typically introduce audible defects, which are often much more obvious. for example, we may need to occasionally pad-out or skip forwards to get things back in sync, but simply jumping will typically result in an obvious "pop" (and padding things out with silence isn't much better), so it is usually necessary to blend over a skip (via interpolation), and insert some "filler" (such as previously mixed samples) for padding things out (with blending at both ends). even then, it is still often noticeable (but, at least the loud/obvious pop can be avoided).


ADD/IME: actually, another observation is that while nearest and linear interpolation (such as in trilinear filtering) often work ok for graphics, nearest and linear interpolation sound poor for audio mixing, so generally a person needs cubic interpolation for upsampling and resampling. to more effectively support arbitrary resampling, such as in Doppler shifts, a stragegy resembling mip-mapping can be used, where the sample is interpolated for each mip-level, and then interpolated between mip-levels.

Edited by cr88192, 20 March 2013 - 06:12 PM.


#23 GeneralQuery   Crossbones+   -  Reputation: 1263

Like
1Likes
Like

Posted 20 March 2013 - 05:34 PM

@cr88192: 100-200ms can be an eternity in terms of audio/visual syncing. I just ran a very crude test and with a latency of 100ms the discontinuity was jarring under certain conditions. A lot of it will be source dependent though, a lot of sounds really aren't critical in terms of syncing with a particular visual cue. A monster cry of pain for example wouldn't need to be synced to start exactly at the same time as the animation. The rate and indeterminacy is also a factor, under my crude test rapid "weapon fire" was much more forgiving than intermittent weapon fire.

 

Apologies to OP, my post isn't really on topic.



#24 Hodgman   Moderators   -  Reputation: 31851

Like
2Likes
Like

Posted 20 March 2013 - 05:42 PM

Keep in mind that we already have too much visual latency in a lot of systems (up to ~100ms), which sets the benchmark for what acceptable audio latency is (you don't want to hear something hit the ground before you see it!).

 

I guess to cut down on a bit of the latency but still allow for GPGPU acceleration, we need to convince AMD/nVidia to start shipping GPUs that have audio connectors on the back, just like they currently have video connectors?

On that note, HDMI actually is an audio connector.... I wonder how the transfer of audio to the GPU for HDMI currently works?

 

 

Thinking more on GPU acceleration, and doing some really bad back-of-the-napkin math:

Let's say that we can comfortably draw a 1920*1280 screen at 30Hz, which is ~73 million pixels a second.

If we then say that an audio sample has the same cost of processing as our pixels (here's the complete simplification), then, 73728000 / 44000Hz == 1675 samples.

Realistically, I'd say that modern games do a hell of a lot more work per pixel than they require per audio sample, so mixing thousands of audio samples via the GPU should definitely be a feasible goal.

 

Audio HDR (or DRC to you audio folk) is something that's hugely under-developed in games compared to visual HDR too. We can now let artists author scenes with realistic (floating point) light values, and contrast levels of 10,000 times, and have them just work thanks to clever photographic exposure schemes.

I haven't seen too many games doing the same with their audio -- in midnight silence, you should be able to hear someone drop a pin in the next room, but on a busy street at mid-day, you'd barely be able to hear a baseball smash a window.



#25 Kylotan   Moderators   -  Reputation: 3338

Like
0Likes
Like

Posted 20 March 2013 - 05:57 PM

You usually have some delay in all soundsystems from when you tell it to start playing until it plays, but I don't know how long it usually is... Longer on mobile devices at least.

 

On a PC, it's probably somewhere between 10ms and 100ms.

 

On an Android phone, it's anything up to a couple of seconds, it seems...

 

Obviously most systems will have some sort of software mixer, which has its own buffer, then the hardware has its own buffer as well (to avoid sending too many I/O interrupts to the OS), so you always have some degree of latency. (Obviously hardware accelerated audio lets you could cut out the software buffer entirely.)

 

I think 10ms of latency would be fine for most gaming applications, providing there is some visual latency as well. Certain early reflection algorithms will never sound realistic at such latencies but I don't think that's solvable on consumer hardware. But I'm worried the practical latency would be higher than 10ms.

 


more so, getting much under around 50-100ms may be itself difficult, largely due to the granularity introduced by things like the game-tick and similar (which is often lower than the raw framerate, where at 60fps, the frame-time is around 17ms, but the game-tick may only be at 10 or 16Hz, or 62-100ms).

 

That's a trivial problem to solve though. Some people run more game ticks than graphics ticks, in fact. It makes a lot of sense to run a very high granularity loop for input and trivial operations to appear responsive, and only have complex logic like AI relegated to the slow tick.

 

there may also be the issue of keeping the audio mixer all that precisely aligned with the sound-output from the audio hardware, so typically a tolerance is used here, with the mixer re-aligning if this drifts much outside 100ms or so (much past 100-200ms and the audio and visuals start to get noticeably out of sync).

 

I'm not sure what issue you're referring to here - the hardware will surely have a sample rate that it wants to work at and you just feed data into its buffer. This isn't a problem that a user application needs to solve - the driver is there to keep it steady.



#26 Kylotan   Moderators   -  Reputation: 3338

Like
0Likes
Like

Posted 20 March 2013 - 06:07 PM

Keep in mind that we already have too much visual latency in a lot of systems (up to ~100ms), which sets the benchmark for what acceptable audio latency is (you don't want to hear something hit the ground before you see it!).

 

I wouldn't say that's true, because syncing audio is not just about syncing to visuals but about syncing to movement. If I play guitar and use my PC as an amp, it is clearly audible that there is latency involved as soon as it creeps much past 6 or 7ms. It ceases to feel like the original note that I played and becomes more like an echo of it.

 

Obviously, to some degree, gamers are not noticing this yet. But it does exist.

 

Audio HDR (or DRC to you audio folk) is something that's hugely under-developed in games compared to visual HDR too.

 

That's actually one of the cheapest ones to implement, and would work in software at virtually no cost. It's a shame that games can't easily just use off-the-shelf VSTs to do that sort of thing.



#27 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 20 March 2013 - 07:53 PM


You usually have some delay in all soundsystems from when you tell it to start playing until it plays, but I don't know how long it usually is... Longer on mobile devices at least.

 
On a PC, it's probably somewhere between 10ms and 100ms.
 
On an Android phone, it's anything up to a couple of seconds, it seems...
 
Obviously most systems will have some sort of software mixer, which has its own buffer, then the hardware has its own buffer as well (to avoid sending too many I/O interrupts to the OS), so you always have some degree of latency. (Obviously hardware accelerated audio lets you could cut out the software buffer entirely.)
 
I think 10ms of latency would be fine for most gaming applications, providing there is some visual latency as well. Certain early reflection algorithms will never sound realistic at such latencies but I don't think that's solvable on consumer hardware. But I'm worried the practical latency would be higher than 10ms.


both my own experience and observations suggest that latencies this small aren't really all that noticeable though.

it is like saying that people will notice sound being off by 1 frame in a 100Hz video.

a more practical limit is 1 frame in 30Hz, or 33ms.
but, my own experience suggests it is a lot more than this (basically, that while both hearing and vision are sensitive to timing, they are not nearly as sensitive to the timing of each-other).


more so, getting much under around 50-100ms may be itself difficult, largely due to the granularity introduced by things like the game-tick and similar (which is often lower than the raw framerate, where at 60fps, the frame-time is around 17ms, but the game-tick may only be at 10 or 16Hz, or 62-100ms).

 
That's a trivial problem to solve though. Some people run more game ticks than graphics ticks, in fact. It makes a lot of sense to run a very high granularity loop for input and trivial operations to appear responsive, and only have complex logic like AI relegated to the slow tick.


originally I did everything on a 10Hz tick, but moved user input handling and player-physics to 16Hz, mostly as at 10Hz, there were a lot of "rubbery" interpolation artifacts, which are mostly absent at 16Hz. at one point, I had used 24Hz, but had worried that this would risk producing too much traffic WRT network delta messages (16Hz was a compromise between 10Hz and 24Hz).

presumably, most sound effects would be being triggered from the AIs and general world entities (doors, ...), which in my case run on the 10Hz tick, so it is unlikely that the timing will be much more precise than this.

fancy physics runs on a subdivided 100Hz virtual tick, but this was mostly to increase stability, and only 10Hz is visible on the client (and, externally, only a 10Hz tick is done).


there may also be the issue of keeping the audio mixer all that precisely aligned with the sound-output from the audio hardware, so typically a tolerance is used here, with the mixer re-aligning if this drifts much outside 100ms or so (much past 100-200ms and the audio and visuals start to get noticeably out of sync).

 
I'm not sure what issue you're referring to here - the hardware will surely have a sample rate that it wants to work at and you just feed data into its buffer. This isn't a problem that a user application needs to solve - the driver is there to keep it steady.


the issue is mostly due to things like driving the audio-mixer via relatively low-precision accumulation timers, where for each frame, say, we measure the frame delta-time in milliseconds, and then add this to an accumulated time value (a float), and then mix samples based on the accumulated timer deltas. (most things in-engine are driven off accumulation timers). (basically: the mixer is driven by game-time, rather than real-time).

often, the timers will begin to drift off the position the sound-card is currently playing at, so that after a few minutes or so, a fair amount of drift may have accumulated, and it may be necessary to re-align the mixer with the sound-card.

I had considered a few times possibly rewriting the mixer, but this hasn't been a high priority.
the design would likely change how a few things are handled, probably use a 172Hz fixed-frequency mixer-tick, and probably Q24.8 (or maybe float) for intermediate samples, and maybe (undecided) be driven using a real-time timer.

#28 MrDaaark   Members   -  Reputation: 3555

Like
0Likes
Like

Posted 20 March 2013 - 08:59 PM

The crappy embedded chips we have now are complete shit compared to what we had in the 90s. They pick up interference from everything else in your rig, and they often chug on simple tasks.

A significant cause of android's keyboard lag is the crappy chip trying to replay the typing sound over and over.

Lots of lag with embedded chips comes from waiting for sound data to be moved around and play. When the chips can't keep up you get stuttering, popping, or dropped frames.

My almost 30 old SoundBlaster 16 is laughing at this.

I wouldn't blame MS for this. Eventually it just got cheaper for OEMs to use cheap embedded parts at low or no cost than to include more expexpensive parts from other vednors. It also lets them keep the power requirements lower. The same thing happens in the GPU space. We get embedded chips and 200W PSUs.

#29 Ashaman73   Crossbones+   -  Reputation: 7992

Like
0Likes
Like

Posted 21 March 2013 - 12:45 AM

If you have a tv, dvd/blueray player and amplifier which let you set the audio output latency, test it yourself by watching a movie. It will get annoying and very obviously once the audio-lip syncronisation fails. I think that it depends a lot on what the brain expects. Most daily visual/audio interactions, like seeing a person talking just in front of you , will lead to hi-demand of syncronisation, whereas (hopefully) uncommon sounds like distant explosions or shooting will have a much higher latency tolerance.

 

My almost 30 old SoundBlaster 16 is laughing at this.

Most sound cards of this time where of the size of a toaster, but yeah, missing them too :)

 

Nevertheless, after watching some indie trailer with 8/16 bit retro visuals (hammerwatch,legend of dungeon, riot,sword and sorcery EP) I felt that there was an incredible fresh sound/music experience. Maybe the overarching visuals of top AAA games have such a great (negative) impact on our audio perception, that we would need just too much efforts to level the sound on-par with the visuals. How many people tend to shut down the visuals (aka closing the eyes) to enjoy the audio experiences.



#30 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 21 March 2013 - 12:55 AM

The crappy embedded chips we have now are complete shit compared to what we had in the 90s. They pick up interference from everything else in your rig, and they often chug on simple tasks.

A significant cause of android's keyboard lag is the crappy chip trying to replay the typing sound over and over.

Lots of lag with embedded chips comes from waiting for sound data to be moved around and play. When the chips can't keep up you get stuttering, popping, or dropped frames.

My almost 30 old SoundBlaster 16 is laughing at this.

I wouldn't blame MS for this. Eventually it just got cheaper for OEMs to use cheap embedded parts at low or no cost than to include more expexpensive parts from other vednors. It also lets them keep the power requirements lower. The same thing happens in the GPU space. We get embedded chips and 200W PSUs.

lag due to playing a sound is more likely a software/driver issue than a hardware one. (ADD: at least as far as the HW doesn't deal with this issue in the first place...).

typically the hardware does not know/understand/care about things like sound-effects or mixing, but rather just provides a looping buffer for the drivers to write premixed sample data into (typically mapped to a physical address range somewhere).

sound-chips also might offer additional features, like the ability for the drivers to set the sample-rate (and sometimes control sample-format), and adjust various volume controls (speaker output left/right volume, record gain, ...), ... (there are also sound-chips where the sample-rate and format is fixed, and volume controls are also handled in software).

(ADD: basically, hardware support for variable sample rate/format and hardware-supplied volume controls are optional features... and so depend on the specific chipset one has...).


actually, in a way, it is sort of like modems:
originally, they looked like serial-port devices, and would accept commands, and do stuff on their own, ...;
then they got reduced basically to a simplistic sound-chip connected to a phone-jack, with nearly all the "modem" stuff being done in software (AKA: winmodems).

actually, for onboard phone jacks, these may actually themselves just be hooked up to the sound-chip as well (so, there may be mono output and record buffers specific to the modem jack, ...).


practically, it doesn't seem all that bad, mostly just a person can't really expect them to do much beyond what they do already, and it may well make more sense at this point to look elsewhere (like the GPU) for advanced audio-mixing effects.

Edited by cr88192, 21 March 2013 - 02:10 AM.


#31 samoth   Crossbones+   -  Reputation: 5034

Like
0Likes
Like

Posted 21 March 2013 - 07:20 AM

I guess to cut down on a bit of the latency but still allow for GPGPU acceleration, we need to convince AMD/nVidia to start shipping GPUs that have audio connectors on the back, just like they currently have video connectors?

On that note, HDMI actually is an audio connector.... I wonder how the transfer of audio to the GPU for HDMI currently works?

 

So I just figured out that my nVidia card actually has HDMI (and I was still using DVI because that's what I've always used...) and that there's this mysterious sound driver that you can optionally install with the display driver (which I never did of course, not seeing any reason for that).

 

Turns out if you plug in a HDMI cable and install that driver, it works just fine indeed. No extra cable, sound comes out of HDMI (actually, that's not very surprising, but... duh me). Don't ask me how it works, but it works really well.

 

There are no big DSP effects to tweak (or I haven't found them), just volume and balance, and you can select the sample rate (44.1kHz/48kHz). But... sound comes out when it's supposed to (no noticeable lag whatsoever), and quality is excellent.

 

So my guess would be that the driver probably just does the bare minimum as to qualify as "sound device" to Windows, and uploads the samples via PCIe, and then the display controller just mixes them into HDMI. Or something. From the available features, it doesn't look like there's some hefty GPGPU sound processing going on, anyway.


Edited by samoth, 21 March 2013 - 07:21 AM.


#32 Olof Hedman   Crossbones+   -  Reputation: 2950

Like
0Likes
Like

Posted 21 March 2013 - 10:00 AM

Not long ago, you had to connect an extra little cable from your motherboard to the GPU to get HDMI sound.

 

Nowadays, it's transfered by PCIe, but I'm pretty sure it's still the sound chip that produce the sound.

I would expect the GPU itself has very little control over this stream, i think it's just routed through PCIe for convenience.


Edited by Olof Hedman, 21 March 2013 - 10:02 AM.


#33 BGB   Crossbones+   -  Reputation: 1554

Like
1Likes
Like

Posted 21 March 2013 - 12:32 PM

Not long ago, you had to connect an extra little cable from your motherboard to the GPU to get HDMI sound.
 
Nowadays, it's transfered by PCIe, but I'm pretty sure it's still the sound chip that produce the sound.
I would expect the GPU itself has very little control over this stream, i think it's just routed through PCIe for convenience.

the sound going over HDMI is digital, so the sound-chip would be in the TV/monitor.

the video card basically presents a sound-device to the OS, so that the OS can send it audio, which is then fed over the HDMI cable along with all the video data.

basically, the HDMI cable is an STP-type cable (shielded twisted pair), with each pair and also the ground/shielding for each pair having a pin.

the data sent is basically as a continuous stream of packets, which may contain pixel data, audio data, or other data (apparently including Ethernet traffic...). video data is apparently sent mostly as raw RGB or YUV (at 24 or 48 bits) and audio uses PCM. apparently, it will mostly send all the video data, and then typically all the audio and other data after the video frame has been sent (in the time between video frames). (apparently the packets are fixed-size, apparently 32 bytes, each with a type-tag and some ECC data, and some payload).

Edited by cr88192, 21 March 2013 - 01:08 PM.


#34 Krohm   Crossbones+   -  Reputation: 3251

Like
0Likes
Like

Posted 23 March 2013 - 04:43 AM

I wonder how the transfer of audio to the GPU for HDMI currently works?

That's a very interesting question! I was thinking about this some time ago. Perhaps it should be possible to have explicit sync points? I'm far more geared towards lowering latency... but going back to original question, considering GPUs have become the real deal when it comes to multimedia processing for PC I'd wish there could be more interaction between graphics and audio.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS