Visemes for cartoon lip sync

Started by
0 comments, last by alvaro 1 year, 3 months ago

I am working on an ML model to map audio to visemes, which could be used to implement lip sync for a cartoon character. I want to make the model available for anyone to use under some very permissive license.

My question is about what set of visemes I should use so this is immediately useful to as many people as possible. I am using the TIMIT database for training the model, and this database has audio recordings with labels that consist of 62 phonemes. I map those phonemes to a much smaller set of visemes and then train a model to predict what viseme should correspond to some window of audio.

I have found tables that map TIMIT phonemes to visemes, like this one: https://www.researchgate.net/figure/2-An-example-viseme-to-phoneme-mapping-using-the-TIMIT-phone-set_tbl3_221052635

However, I can't find any sprite sheets with mouth shapes that correspond to those visemes, so I wonder how useful my model would be if I train it to produce them.

I've found some sprite sheets like this one: https://www.vecteezy.com/vector-art/13134214-kid-mouth-animation-sprite-sheet

I made some mapping from TIMIT phonemes to the 12 visemes used in that sprite sheet, but I'm not an expert in either phonology or in cartoon animation, so I'm not sure if I did a reasonable job or not. Here's my first attempt at a mapping:

0: ae eh ay ey hh hv ax ih dx uh ao oy ax-h ix ah aa

1: th

2: aw ow

3: iy eng

4: uw ux

5: el l

6: ch jh sh zh

7: b p bcl pcl m em h# pau

8: f v

9: w q er axr r

10: s z epi tcl dcl n en y ng dh ng nx gcl kcl

11: t d g k

I'd appreciate any help you can provide. If you want to test out my code in your project, that would be great!

This topic is closed to new replies.

Advertisement