Plastic Surgery Research Council
Members Only  |  Contact  |  PSRC on Facebook
PSRC 60th Annual Meeting
Program and Abstracts

Back to 2015 Annual Meeting Program


An Automated Cleft Speech Evaluator Using Speech Recognition Software
James R. Seaward, FRCS(Plast.)1, Julie Cook, MS, CCC-SLP2, Cortney Van'T Slot, MS, CCC-SLP2, Rami Hallac, PhD2, Megan Vucovich, MD1, Alex A. Kane, MD1.
1University of Texas Southwestern Medical Center, Dallas, TX, USA, 2Children's Medical Center, Dallas, TX, USA.

PURPOSE:
Perceptual evaluation by a professional speech pathologist remains the gold-standard of analysis of cleft speech anomalies. As with any human interpretation, however, bias is inevitable and eliminating bias to allow comparative analysis of speech data between units is labor and time intensive. On a global level, there is a shortage of professional listeners for cleft speech evaluation, limiting the extent to which cleft care can be delivered globally. We have developed an automated tool for evaluation of cleft speech, not to replace speech pathologists, but as a tool to facilitate collaboration between units and to extend speech analysis resources globally.
METHODS:
Speech recognition engines are widespread and are even built in to most mobile devices. They identify spoken words by matching sound probabilities to words in a dictionary. Speech style, voice characteristics and even speech errors of the speaker tend to be ignored to give the best match of the individual word. Recognition accuracy is further improved by including grammatical rules about the language in question, so the probability of identifying the correct sentence is increased by considering the probability of certain words following others in context. Our speech analysis engine, by contrast, is not interested in the word itself, rather the characteristics of the voice and speech errors of the sound generated. In order to achieve this, we designed a Hidden Markov Model speech analysis tool and restricted the available grammar of the recognition language to a single sentence from the CAPS-A-AM list. We also dramatically restricted the available vocabulary of the speech engine to the words of that sentence either as normal speech or with the common cleft-type speech errors for the sound in question (i.e. instead of asking the speech engine whether a word is ‘Sean’, ‘born’, ‘lawn’ or ‘torn’, we ask whether it is ‘Sean in normal speech’, ‘Sean with features of VPI’, ‘Sean with compensatory articulation errors’ or ‘Sean with developmental articulatory errors’). Speech samples from our Craniofacial Team clinic were rated independently by two experienced speech pathologists and by our speech recognition engine.
RESULTS:
100 speech samples of each of the 24 CAPS-A-AM sentences were rated and used to train the speech recognition engine. A further 25 speech samples will be rated and used to test the recognition accuracy of the engine. While initial tests of recognition accuracy are encouraging, definitive outcomes are not available at this time as the engine is still in its final training phase, but will be available by the end of 2014, well in advance of the PSRC meeting date.
CONCLUSION:
This specific application of a speech recognition engine has wide reaching applications both in terms of helping to identify patients who would benefit from speech therapy and those with likely VPI when no speech pathologist is at hand, and as an objective speech analysis resource to facilitate speech outcome data collaboration between units.


Back to 2015 Annual Meeting Program