Thinking Out Loud: 2010

Wednesday, February 17, 2010

Another etymology

I love etymologies, especially when I can connect otherwise ordinary English words with each other or with a Greek word. Today's example:

The Greek word "kachazo" is the same as the English word "cackle". One of the variants of the Greek word is "kagchalao", which is where the "l" in "cackle" can be seen.

But there's another English word which is also from the same root: "laugh".

The root in Proto-Indo-European is apparently "klak-", and in Greek the "l" dropped off to make "kak-" > "kach-", while in English the first "k" dropped off to make "lak-" > "lach-" > "lagh-" > "laugh".

I love etymologies. :-)

Friday, February 12, 2010

"Spirit" nothing to sneeze at...

Recently I've been having some fun with etymologies (in addition to developing my speech synthesizer). I found a cool one just tonight: "sneeze". Turns out it is cognate with the Greek word "pneuma", which is often translated as "spirit" (e.g. the Holy Spirit). "Sneeze" comes to us from its Germanic ancestor, "fneusen", which in turn comes from its proto-Indo-European ancestor, the root "*pneu". This is the same root that gave rise to Greek "pneuma". The original meaning of the word was "to breathe", but it also gets used to speak of "wind" and such. So to "sneeze" is to breathe out windily! :-)

Saturday, February 06, 2010

Speech Synthesis Part 2

It's been busy around here lately with grant writing and various other things. But I've been managing to make steady progress on the speech synthesizer. The synthesizer I described in my last post has been scrapped and completely redesigned. There is now way more flexibility in the number of poles and zeros, I have implemented models of the KLGLOTT88 voice source and a source I have used in the past based on Ananthapadmanabha and Fant (1982). (Actually, my KLGLOTT88 source is not 100% functional just yet - it is lacking aspiration noise and I'm not 100% convinced that the spectral tilt is implemented propertly.) The synthesizer is also starting to be a bit more user friendly (lots more work to do on this, though), and it is definitely more "coder-friendly", i.e. the code is much simpler, much shorter, much more modular, and much easier to understand. And unlike the synthesizer described in my last post, the current one can handle voiced obstruents just as well as voiceless obstruents or vowels.

Things that need to be worked out still: I need to implement the aspiration and fricative noise sources, and I need to make sure the amplitudes of the noise and voice sources are appropriately balanced. I'd like to implement the LF model as well, for another voice source option. Finally, I need to make it much more user-friendly by putting all the pieces together into a single program which can read Klatt synthesizer-type parameter specifications from a file and generate the synthetic speech from that.

There will probably be a need for additional tweaks here and there, but they should be relatively minor. The end result will be an implementation of the Klatt synthesizer with a lot more flexibility than the Klatt synthesizer itself. I'll then be able to embed the synthesizer in other synthesis and analysis projects that I have in mind.

Fun. :-)

Saturday, January 16, 2010

Speech Synthesis!

I've never done much with speech signal processing - creating spectra and spectrograms, and calculating spectra from acoustic circuit models of the vocal tract is about as far as I go. But recently I decided it was high time I tried my hand at a (formant) synthesizer. Fortunately, I knew just where to start looking for help: Dennis Klatt's 1980 paper in JASA. Aside from one typo which I found in an equation (there was a missing division sign), the paper is most informative, and there is an appendix with the complete code for the synthesizer described. ...written in FORTRAN. So I learned a few things about FORTRAN while I was trying to decipher how the synthesizer worked. Then I noticed the typo, and it became necessary to do some hunting around the internet and some of my books to figure out which form of the equation was correct, and whether there were any other typos (there weren't). I started off, then, rewriting the Klatt Synthesizer in MATLAB. I already have a version of this by MKT, but it's in mex format, which I know a bit less well than FORTRAN, and the Klatt Synthesizer has some limitations that I want to overcome. So I was going to rewrite the Klatt Synthesizer myself, in MATLAB, thereby forcing myself to learn how the synthesizer worked. Once that was done and working, I could focus on changing the synthesizer to meet my particular needs. Well, after a while (and after I had solved the problem of the typo) I decided to take a different tack. It would be easier to write the synthesizer I wanted from the get-go, rather than using the Klatt version as a stopping-off point. So I worked on putting the principles learned from Klatt to work. The result: nothing like speech. I finally got around to some debugging tonight, and figured out what was wrong: a missing minus sign. :-} That's now fixed, and I am successfully able to produce three formant vowels. Actually, I should be able to produce much more than that - I just haven't tried the other possibilities yet. This synthesizer has some nice features. First, it has poles and zeros for each of the first 6 formants, for each of the first 3 nasal and subglottal resonances, and for each of the first two interdental resonances. If the frequency and bandwidth for any pair are identical, they are simply not synthesized (they would cancel out anyway); and if either the pole or the zero coefficients are set to zero, that pole or zero is not synthesized. Second, the synthesizer updates the resonator and anti-resonator coeficients between each pair of samples. So if the acoustic sampling rate is 16000 Hz, then 1/16000 Hz is also the parameter update interval. Third, the source is defined completely separate from the resonators and anti-resonators. If the source is given, one can calculate the vocal tract filter and input that into the synthesizer, and since the sampling rate is so large compared to the fundamental frequency, within-glottal cycle changes in the filter should be no problem. The one thing that this synthesizer cannot currently do is produced voiced obstruents. In a later version of the synthesizer, this will be remedied, and the source parameter will probably be broken down into phonation and frication/aspiration components. I plan to use this synthesizer (or its later versions) to create synthetic speech for speech perception experiments, and also to study the analysis of subglottal coupling in natural and synthetic speech, and to investigate certain aspects of fricative acoustics that I'm rather interested to look into.

Among my many readers (that's right, I'm talking about all two of you) this post is probably interesting only to me. Sorry... :-}

Thinking Out Loud

About Me

Links

Blog Archive