Saturday, February 06, 2010

Speech Synthesis Part 2

It's been busy around here lately with grant writing and various other things. But I've been managing to make steady progress on the speech synthesizer. The synthesizer I described in my last post has been scrapped and completely redesigned. There is now way more flexibility in the number of poles and zeros, I have implemented models of the KLGLOTT88 voice source and a source I have used in the past based on Ananthapadmanabha and Fant (1982). (Actually, my KLGLOTT88 source is not 100% functional just yet - it is lacking aspiration noise and I'm not 100% convinced that the spectral tilt is implemented propertly.) The synthesizer is also starting to be a bit more user friendly (lots more work to do on this, though), and it is definitely more "coder-friendly", i.e. the code is much simpler, much shorter, much more modular, and much easier to understand. And unlike the synthesizer described in my last post, the current one can handle voiced obstruents just as well as voiceless obstruents or vowels.

Things that need to be worked out still: I need to implement the aspiration and fricative noise sources, and I need to make sure the amplitudes of the noise and voice sources are appropriately balanced. I'd like to implement the LF model as well, for another voice source option. Finally, I need to make it much more user-friendly by putting all the pieces together into a single program which can read Klatt synthesizer-type parameter specifications from a file and generate the synthetic speech from that.

There will probably be a need for additional tweaks here and there, but they should be relatively minor. The end result will be an implementation of the Klatt synthesizer with a lot more flexibility than the Klatt synthesizer itself. I'll then be able to embed the synthesizer in other synthesis and analysis projects that I have in mind.

Fun. :-)

No comments: