Present & Future AI in Games - Voice/Speech

Started by
14 comments, last by wodinoneeye 7 years, 6 months ago

The other advantage is, if you control the TTS+markup software, then there is no 'renegotiating' contracts on sequels, and anyone trained on the team can produce new content with an iconic voice of the series.

I don't think it's been tested yet, but my sense is that U.S. copyright law will probably treat voices more like a likeness than a work, therefore we'll probably have to renegotiate contracts for further uses, unless we paid the premium for unlimited future use in the original contract. Random schmoes will accept unlimited future use contracts, but professional voice actors probably won't, because making a general-purpose replica of their iconic voice would be signing away their livelihood.

Advertisement

Very interesting to hear about HL1's use of a similar tech. But kind of disappointing that I haven't come across efforts to bring things further along since then.

The other advantage is, if you control the TTS+markup software, then there is no 'renegotiating' contracts on sequels, and anyone trained on the team can produce new content with an iconic voice of the series.

I don't think it's been tested yet, but my sense is that U.S. copyright law will probably treat voices more like a likeness than a work, therefore we'll probably have to renegotiate contracts for further uses, unless we paid the premium for unlimited future use in the original contract. Random schmoes will accept unlimited future use contracts, but professional voice actors probably won't, because making a general-purpose replica of their iconic voice would be signing away their livelihood.

Yes, but it would fall under the same copyright rules as the original software, and be owned by the developer outright. It isn't an actor's voice, and would be lumped into all the usual IP laws and ownership along with the rest of the games art and written content. If the whole dev team decides to up and leave, then their replacements can still produce the same character voices using the same software. There is no awkward recasting, or working around voice actor schedules.

Old Username: Talroth
If your signature on a web forum takes up more space than your average post, then you are doing things wrong.

Very interesting to hear about HL1's use of a similar tech. But kind of disappointing that I haven't come across efforts to bring things further along since then.

The other advantage is, if you control the TTS+markup software, then there is no 'renegotiating' contracts on sequels, and anyone trained on the team can produce new content with an iconic voice of the series.

I don't think it's been tested yet, but my sense is that U.S. copyright law will probably treat voices more like a likeness than a work, therefore we'll probably have to renegotiate contracts for further uses, unless we paid the premium for unlimited future use in the original contract. Random schmoes will accept unlimited future use contracts, but professional voice actors probably won't, because making a general-purpose replica of their iconic voice would be signing away their livelihood.

Yes, but it would fall under the same copyright rules as the original software, and be owned by the developer outright. It isn't an actor's voice, and would be lumped into all the usual IP laws and ownership along with the rest of the games art and written content. If the whole dev team decides to up and leave, then their replacements can still produce the same character voices using the same software. There is no awkward recasting, or working around voice actor schedules.

You guys are talking about different things. valrus is talking about the use of voice actors. Luckless is talking about the use of TTS as an alternative to voice actors.

Though that does bring up a huge issue with the technology we're talking about.

Now on the other hand... if you decided up front that you want to make a game where the NPC's use synthesized speech, then you could design a game where all the NPC's are robots with bad abilities to process emotion and speak naturally Then current TTS systems would be well suited to your game!

I did this in the first version of SIMtrek back in 1989. I found a PD library that took strings of phonemes and generated the sounds on the PC speaker. This was before the first adlib soundcard. At a time when music and sfx in games were limited to beeps, people were amazed when you'd get an encounter and the PC would actually TALK and say "Scanner contact - Klingon battle cruiser, - bearing 327, mark 8." in a voice like a Cylon warrior from the original Battlestar Galactica. And of course you could program it to say anything you wanted - you just had to spell everything in phonemes.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Tone is the issue, it conveys a lot of meaning. A single word may have a number of distinct tones with distinct meanings - interrogative vs exclamation for example.

And tone is very integral to conveying emotion.

A system that used samples of voice recordings of single words, with multiple tones for each word could work. I've done this in the past as well - but without multiple tones.

But like anything, its all about hassle to benefit. Is it more work or more expensive than voice acting? Are the benefits real in the here-and-now? or are they just assumed or imagined benefits at some point in the always uncertain future that is game development? That's over-designing for a tomorrow that may never come.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Ive been considering Text to Speech use for a big Next Gen or Next Next Gen type of MMORPG which would require alot of voice generation (largely on the fly). That system relies on Player Created Assets to lower costs and Precanned Voice recordings are prohibitive (and not versatile enough), so autogeneration (being combinatoric script-logic driven) at the Client machine is an important element.

Uncanny Valley problem - may just have to train the user to accept the output (it a problem that probably never really will go away )

The proccessing resource issue - does/can the current 'better quality' TtS programming do the generation in Real Time (are they using GPU yet too...) -- as procedurally generated content being reactive to the player's responses in a versatile way (not just a limited pre-expected set of response verbage) is much required. (For that application some JIT generation may be allowable to alleviate some timely CPU resource bottlenecking)

Fortunately the general commercial use of this technology will keep development moving forward (having some big bucks behind it).

In comes the management of Voice Profile Assets and whatever markup data is required for the 'Text' to impart the desired inflections/tonal/etc content to be imparted to the output.

-

I recall one thing from the past : Atari 800 game Castle Wolfenstein with the crude lil nazis shouting a barely legible 'Achtung!' -- which might be a nice analogy for where the speech generation is today versus what it will be like in the future (we hope). If you remember how crude the sound interface was on the Atari 800, it makes me wonder how long the programmer had taken to shape just that canned sound snippet.

-

Another issue might be : in an immersive environment you need MANY NPCs to be speaking at the same time, thus increasing the speech processing load. Far away TtS conversations can be (cheaply) patched over with mumbles/droning (preferably to be realistically generated themselves with rythmic attributes/whatever), this implemented so as you near the source the transition to High Quality isn't jarring --

yes - Speech LOD now ...

--------------------------------------------[size="1"]Ratings are Opinion, not Fact

This topic is closed to new replies.

Advertisement